receipt-parser-legacy icon indicating copy to clipboard operation
receipt-parser-legacy copied to clipboard

Support for PDF receipts

Open bram-atmire opened this issue 6 years ago • 1 comments

Not sure if this use case is shared among others: I use Scanbot to scan my receipts as multi-page PDFs. Would be great if this tool could work on these pdfs.

Scanbot does a sort of OCR itself, but it doesn't seem to be that good, in the sense that it adds too much noise: a receipt contains so much text, and I'm only interested in the articles, price per article, to see price evolution across multiple weeks.

bram-atmire avatar Oct 26 '19 13:10 bram-atmire

Your use-case makes a lot of sense to me. We could use pdf2image as a preprocessor before recognizing the text. I think that would be the easiest thing to try. Alternatively you could try OCRmyPDF to see if it works with your inputs out of the box.

mre avatar Oct 26 '19 21:10 mre