invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

using tesseract4 option when use as a python library?

Open erkin98 opened this issue 3 years ago • 3 comments

How can i use tesseract option when use inovice2data as a library?

erkin98 avatar Mar 10 '21 06:03 erkin98

Hey @erkin98, Haven't tried anything except pdftotext, but you should be able to specify it as the third argument to the extract_data function which is defined as: extract_data(invoicefile, templates=None, input_module=pdftotext)

Docstring shows:

input_module : {'pdftotext', 'pdfminer', 'tesseract'}, optional
        library to be used to extract text from given `invoicefile`

RossK1 avatar Mar 18 '21 03:03 RossK1

I've tried and it does not work. Anyone managed to use it?

Carlos314159 avatar Jun 17 '22 14:06 Carlos314159

I use tesseract as an inputmodule in: https://github.com/OCA/edi/pull/567

Tried tesseract4 from the commandline but got an error:

convert-im6.q16: unable to open image `/tmp/tmp_a3u7owr.tiff': No such file or directory @ error/blob.c/OpenBlob/2874.
convert-im6.q16: no images defined `tiff:-' @ error/convert.c/ConvertImageCommand/3258.

Note: the title of this issue mentions tesseract4, but the post mentions tesseract

bosd avatar Sep 26 '22 17:09 bosd

@erkin98 @Carlos314159 New PR in: https://github.com/OCA/edi/pull/722 to fix/restore this functionality. Would you be so kind to review? :pray:

Edit: Oops, was not paying attention. Thought I was posting this in the odoo repo :astonished:

bosd avatar Feb 10 '23 20:02 bosd

Anyway, recently refactoring has be done on the tesseract input module. Tesseract 4 is now the default. Languages are automatically detected.

It can now be used on image files. Currently there is one issue with parsing pdf files. But hotfix is on it's way in https://github.com/invoice-x/invoice2data/pull/468

Alhough my previous post was in the wrong repo. It might be usefull for some of you as an code example.

You can see a live test at that pr:

  • on gh actions (click on show all checks), --> click on runboat/build (details) .
  • Click on start (wait a couple of mins, background turns green)
  • Click on Live
  • Login with user: admin password: admin
  • Go to invoicing-->vendors--> Import vendor bills

Closing this one as completed. feel free to reopen.

bosd avatar Feb 10 '23 20:02 bosd