dangerzone icon indicating copy to clipboard operation
dangerzone copied to clipboard

Consider doing On-host Pixels to PDF Conversion

Open deeplow opened this issue 7 months ago • 4 comments

Historically on the containers version of Dangerzone the conversion happens on a second container. This was needed since Dangerzone relied on many linux-native programs for conversion such as GraphicsMagic, ghostscript (for compression via ps2pdf ). In Qubes the conversion from pixels to PDF already happens on the host.

If we proceed with PyMuPDF (see #622) we no longer need such programs and so we are left with only one dependency to install on the host: Tesseract-OCR.

Tasks:

  • [x] figure out Tesseract-OCR Windows and MacOS packaging
  • [ ] adapt code to run client-side (@deeplow did a PoC and the change was trivial)
  • [ ] build the PDF as pages are created (we currently first convert everything and only then do the rest)
  • [ ] Download OCR language data under share/ for inclusion on Windows / MacOS, as well as when we do development.
  • [ ] Add OCR language data for all languages as optional dependencies in our Linux packages (.deb/.rpm)
  • [ ] Test on-host conversion on Windows, both on local builds and on our CI.

deeplow avatar Nov 24 '23 12:11 deeplow