OCRmyPDF icon indicating copy to clipboard operation
OCRmyPDF copied to clipboard

OCR PDF Attachments?

Open jmrichardson opened this issue 6 years ago • 1 comments

Does/will OCRmyPDF support embedded documents//attachments in a portfolio? Thanks

jmrichardson avatar Apr 20 '18 02:04 jmrichardson

Not currently, and it's not planned any time soon, but I think you're second or third person to ask so there's some demand anyway. (See also #197)

I made some notes about how to go about doing this, whether it's useful to you for me as reference when I implement it:

Recently Ghostscript added PDF/A-3 so it's possible within Ghostscript. The current solution would be to modify the pdfmark file, named pdfa.ps, generated by ocrmypdf/pdfa.py, to include a step to embed the file insert according to the pdfmark specification: – see page 30, for the /EMBED command and this Ghostscript bug for a functioning example. Use absolute paths.

A better option would be to teach pikepdf how to embed files according to reference manual section 7.11.4, since this is would work without Ghostscript. OCRmyPDF will add pikepdf as dependency soon (I maintain both).

If you're able to do a PR for either I'd be happy to accept.

jbarlow83 avatar Apr 20 '18 06:04 jbarlow83