invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

Return extracted_str if no templates found with extract_data() (python) ?

Open Whaoo opened this issue 3 years ago • 4 comments

Hi,

Is there a way to return the extracted_str (full pdf text in str) if no templates are found for the pdf ?

Saw it in the main.py that in debug extracted_str is exactly what i want to collect, that would save me time rather than calling and storing again pdf2text.

Is there any way to return it in extract_data() if no templates are found for the .pdf ?

Many thanks

Whaoo avatar Sep 08 '22 16:09 Whaoo

Is this what you are looking for? Or get some inspiration from? Did'nt test this.

https://github.com/OCA/edi/pull/399/files#diff-652ac3ae132c668bf2ac61903174bbc0c254c98bf549aac7cad47a515259ed32R70-R128

bosd avatar Feb 13 '23 10:02 bosd

Maybe we could make invoice2data more object oriented?

# Use static method
templates = Invoice2Data.read_templates("templates/")

i2d = Invoice2Data()
try:
    i2d.extract_data("foo.pdf", templates=templates)
except Exception as e:
    print('Failed to extract data: ' + str(e))
    print('Extracted text: ' + i2d.get_extracted_text())

rmilecki avatar Feb 18 '23 22:02 rmilecki

Hi @rmilecki I'm looking for a way to have the detail of the parsing error. (no templates found / missing required feld / ...). For the time being, the information is in the log, but not accessible if using invoice2data as a library.

what I don't understand in your code, is that AFAIK, extract_data doesn't raise an error. Or did I missed something ?

try:
    i2d.extract_data("foo.pdf", templates=templates)
except Exception as e:
    print('Failed to extract data: ' + str(e))
    print('Extracted text: ' + i2d.get_extracted_text())

legalsylvain avatar Feb 21 '23 22:02 legalsylvain

Hi guys, nice to see my question is interesting other people Will try using what you pushed @rmilecki :)

Whaoo avatar Mar 01 '23 11:03 Whaoo