Unable to Convert Custom Invoice Template
I'm working on customizing the invoice template for the invoice extraction process, but I'm unable to extract the expected fields from the attached PDF using the custom template.
Input PDF
Expecting Output jsut like
"fields": {
"invoice_number": "AUTOSAL4406",
"proforma_invoice_date": "26/12/2024",
"order_number": "AUTOSAL4406",
"order_date": "26/12/2024",
"seller_details": {
"name": "Impex",
"address": "Noida, Uttar Pradesh, INDIA",
"ABN": "4589",
"email": "[email protected]"
},
"buyer_details": {
"name": "Tester Edit",
"email": "[email protected]",
"address": "ABU DHABI"
},
"country_of_origin": "NOIDA, INDIA",
"country_of_final_destination": "AUSTRALIA",
"port_of_loading": "Noida, Uttar Pradesh",
"port_of_discharge": "BRISBANE",
"items": [
{
"container_number": "1",
"packing_qty": "3.00",
"description": "This is a text",
"HS_code": "57021000",
"unit_price": "234.00",
"total_price": "702.00"
},
{
"container_number": "1",
"packing_qty": "3.00",
"description": "This is a text",
"HS_code": "57021000",
"unit_price": "234.00",
"total_price": "702.00"
}
],
"total_price_AUD": "1,404.00",
"tolerance": "This product belongs to impex docs",
},
}
My Current Template is
issuer: Impex
fields:
invoice_number: "PROFORMA INVOICE NUMBER AND DATE\\s+PAGE NUMBER\\s+([A-Za-z0-9]+)"
date: "26/12/2024"
amount: "1,404.00"
sales_order_number: "SALES ORDER NUMBER AND DATE\\s+([A-Za-z0-9]+)"
sales_order_date: "SALES ORDER NUMBER AND DATE\\s+[A-Za-z0-9]+\\s*(\\d{2}/\\d{2}/\\d{4})"
port_of_loading: "PORT OF LOADING\\s+(.*?)\\s+COUNTRY OF ORIGIN"
port_of_discharge: "PORT OF DISCHARGE\\s+(.*?)\\s+BUYER"
country_of_origin: "COUNTRY OF ORIGIN\\s+([A-Za-z]+)"
country_of_final_destination: "COUNTRY OF FINAL DESTINATION\\s+([A-Za-z]+)"
buyer_name: "BUYER\\s+(.*?)\\s+Email"
buyer_email: "BUYER\\s+.*?Email:\\s*(\\S+)"
seller_name: "SELLER\\s+(.*?)\\s+ABN"
seller_email: "SELLER\\s+.*?Email:\\s*(\\S+)"
seller_abn: "ABN:\\s*([0-9]+)"
amount: "TOTAL\\s+.*?AUD\\s*([\\d,]+\\.\\d{2})"
tolerance: "TOLERANCE\\s*:\\s*(.*?)\\s*(?=PAYMENT TERMS)"
payment_terms: "PAYMENT TERMS\\s*:\\s*(.*?)\\s*(?=SPECIFICATION)"
specification: "SPECIFICATION\\s*:\\s*(.*?)\\s*(?=WE CONFIRM)"
origin_confirmation: "WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN"
signatory: "Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD"
tables:
- start: "DESCRIPTION OF GOODS"
end: "TOTAL"
body: "1\\s+This product.*?\\s+(?P<qty>\\d+\\.\\d+)\\s+(?P<description>This is a text)\\s+(?P<hs_code>57021000)\\s+(?P<unit_price>234\\.00)\\s+(?P<line_total>702\\.00)"
options:
currency: AUD
decimal_separator: "."
keywords:
- "PROFORMA INVOICE"
- "Impex"
- "AUSTRALIA"
- "INDIA"
Current Output
{ 'amount': 1404.0,
'country_of_final_destination': 'NOIDA',
'country_of_origin': 'COUNTRY',
'currency': 'AUD',
'date': datetime.datetime(2024, 12, 26, 0, 0),
'desc': 'Invoice from Impex',
'invoice_number': 'Impex',
'issuer': 'Impex',
'origin_confirmation': 'WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN',
'payment_terms': 'This product belongs to impex docs',
'sales_order_number': 'Noida',
'seller_abn': '4589',
'signatory': 'Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD',
'specification': 'This product belongs to impex docs',
'tolerance': 'This product belongs to impex docs'}
Hey @bosd / @m3nu / @alexis-via ,
Thanks so much for creating this fantastic library!
I'm currently struggling with creating a custom template as described above. Despite my efforts, I'm not able to achieve the expected output from the invoice extraction process. Would you mind guiding me on how to adjust the template to correctly extract the fields I need?
Your help would be greatly appreciated!
Thanks in advance!
You may have more luck here when using the lines parser..
lines: - start: "DESCRIPTION OF GOODS" end: "TOTAL" body: "1\s+This product.*?\s+(?P
( Quick reply from phone, untested)