invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

Unable to Convert Custom Invoice Template

Open rupesh881 opened this issue 11 months ago • 2 comments

I'm working on customizing the invoice template for the invoice extraction process, but I'm unable to extract the expected fields from the attached PDF using the custom template.

Input PDF

sample.pdf

Expecting Output jsut like

	"fields": {
		"invoice_number": "AUTOSAL4406",
		"proforma_invoice_date": "26/12/2024",
		"order_number": "AUTOSAL4406",
		"order_date": "26/12/2024",
		"seller_details": {
			"name": "Impex",
			"address": "Noida, Uttar Pradesh, INDIA",
			"ABN": "4589",
			"email": "[email protected]"
		},
		"buyer_details": {
			"name": "Tester Edit",
			"email": "[email protected]",
			"address": "ABU DHABI"
		},
		"country_of_origin": "NOIDA, INDIA",
		"country_of_final_destination": "AUSTRALIA",
		"port_of_loading": "Noida, Uttar Pradesh",
		"port_of_discharge": "BRISBANE",
		"items": [
			{
				"container_number": "1",
				"packing_qty": "3.00",
				"description": "This is a text",
				"HS_code": "57021000",
				"unit_price": "234.00",
				"total_price": "702.00"
			},
			{
				"container_number": "1",
				"packing_qty": "3.00",
				"description": "This is a text",
				"HS_code": "57021000",
				"unit_price": "234.00",
				"total_price": "702.00"
			}
		],
		"total_price_AUD": "1,404.00",
		"tolerance": "This product belongs to impex docs",
	},
}

My Current Template is

issuer: Impex
fields:
  invoice_number: "PROFORMA INVOICE NUMBER AND DATE\\s+PAGE NUMBER\\s+([A-Za-z0-9]+)"
  date: "26/12/2024"  
  amount: "1,404.00" 
  sales_order_number: "SALES ORDER NUMBER AND DATE\\s+([A-Za-z0-9]+)"
  sales_order_date: "SALES ORDER NUMBER AND DATE\\s+[A-Za-z0-9]+\\s*(\\d{2}/\\d{2}/\\d{4})"
  port_of_loading: "PORT OF LOADING\\s+(.*?)\\s+COUNTRY OF ORIGIN"
  port_of_discharge: "PORT OF DISCHARGE\\s+(.*?)\\s+BUYER"
  country_of_origin: "COUNTRY OF ORIGIN\\s+([A-Za-z]+)"
  country_of_final_destination: "COUNTRY OF FINAL DESTINATION\\s+([A-Za-z]+)"
  buyer_name: "BUYER\\s+(.*?)\\s+Email"
  buyer_email: "BUYER\\s+.*?Email:\\s*(\\S+)"
  seller_name: "SELLER\\s+(.*?)\\s+ABN"
  seller_email: "SELLER\\s+.*?Email:\\s*(\\S+)"
  seller_abn: "ABN:\\s*([0-9]+)"
  amount: "TOTAL\\s+.*?AUD\\s*([\\d,]+\\.\\d{2})"
  tolerance: "TOLERANCE\\s*:\\s*(.*?)\\s*(?=PAYMENT TERMS)"
  payment_terms: "PAYMENT TERMS\\s*:\\s*(.*?)\\s*(?=SPECIFICATION)"
  specification: "SPECIFICATION\\s*:\\s*(.*?)\\s*(?=WE CONFIRM)"
  origin_confirmation: "WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN"
  signatory: "Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD"
tables:
  - start: "DESCRIPTION OF GOODS"
    end: "TOTAL"
    body: "1\\s+This product.*?\\s+(?P<qty>\\d+\\.\\d+)\\s+(?P<description>This is a text)\\s+(?P<hs_code>57021000)\\s+(?P<unit_price>234\\.00)\\s+(?P<line_total>702\\.00)"
options:
  currency: AUD
  decimal_separator: "."
keywords:
  - "PROFORMA INVOICE"
  - "Impex"
  - "AUSTRALIA"
  - "INDIA"

Current Output

 { 'amount': 1404.0,
  'country_of_final_destination': 'NOIDA',
  'country_of_origin': 'COUNTRY',
  'currency': 'AUD',
  'date': datetime.datetime(2024, 12, 26, 0, 0),
  'desc': 'Invoice from Impex',
  'invoice_number': 'Impex',
  'issuer': 'Impex',
  'origin_confirmation': 'WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN',
  'payment_terms': 'This product belongs to impex docs',
  'sales_order_number': 'Noida',
  'seller_abn': '4589',
  'signatory': 'Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD',
  'specification': 'This product belongs to impex docs',
  'tolerance': 'This product belongs to impex docs'}

rupesh881 avatar Jan 26 '25 08:01 rupesh881

Hey @bosd / @m3nu / @alexis-via ,

Thanks so much for creating this fantastic library!

I'm currently struggling with creating a custom template as described above. Despite my efforts, I'm not able to achieve the expected output from the invoice extraction process. Would you mind guiding me on how to adjust the template to correctly extract the fields I need?

Your help would be greatly appreciated!

Thanks in advance!

rupesh881 avatar Jan 26 '25 08:01 rupesh881

You may have more luck here when using the lines parser..

lines: - start: "DESCRIPTION OF GOODS" end: "TOTAL" body: "1\s+This product.*?\s+(?P\d+\.\d+)\s+(?PThis is a text)\s+(?P<hs_code>57021000)\s+(?P<unit_price>234\.00)\s+(?P<line_total>702\.00)"

( Quick reply from phone, untested)

bosd avatar Jan 27 '25 12:01 bosd