Install on Raspberry Pi
Sorry -- noob question ahead.
First of all thank you for the fantastic project. Unfortunately, I was not able to install. I got to the part where I run the docker create command. However, it returns:
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm/v7) and no specific platform was requested
Can anyone give me a primer on how to install ocrmypdf-auto on Raspbian? Any help would be much appreciated.
So there is no arm version of this docker
If you take a look on dockerhub tags for example, you will see the options available:
https://hub.docker.com/r/cmccambridge/ocrmypdf-auto/tags
If there was an arm version, you could specify it in your docker run command. So instead of cmccambridge/ocrmypdf-auto it might be cmccambridge/ocrmypdf-auto:arm to pull a specific version.
Have a look at some of the other docker tags out there, for example - https://hub.docker.com/_/node?tab=tags
Since there is no perbuilt armhf image, I tried to build it on a RPi4 with docker build .. But it fails due to problems with systemtools and pikepdf installation.
Systemtools fails since it is not able to read version info correctly from pikepdf tarball, see https://stackoverflow.com/questions/67074684/pip-has-problems-with-metadata
This can be fixed by adding --upgrade --no-cache-dir --use-deprecated=legacy-resolver to the pip install call in Dockerfile as so: pip install --upgrade --no-cache-dir --use-deprecated=legacy-resolver -r /app/requirements.txt
But then the pikepdf build fails, since it is not working with armhf, see https://github.com/pikepdf/pikepdf/issues/138. It seems possible to build it on armhf see e.g. https://github.com/piwheels/packages/issues/176. In paperless-ngx they seem to get it working also: https://github.com/paperless-ngx/paperless-ngx/blob/9a1bd9637cbef0e7318a3c61cae14805c6510cea/docker-builders/Dockerfile.pikepdf.
It is a pity ocrmypdf-auto is not available for RPi at the moment, as it would be a neat tool to have on a home RPi deployment.
Apologies, all. I don't have an available rpi to do any development on, and have not had time to invest any effort into ocrmypdf-auto in general, as is probably apparent from the growing queue of unresolved Issues. The existing docker image as-is continues to work... but it's not kept up to date with upstream releases right now. (This container is version 11.3.3, upstream is 13.5.0 at the moment.)
For running OCRmyPDF on an RPi, I would recommend trying the official upstream docker container's watched folder feature.
We've looked at the official container in past issues as a plausible complete replacement, or at least a new base image for ocrmypdf-auto, and looking today at the current documentation, it appears that new features have been added to the watcher as well. It may be a sufficient replacement for your needs, depending on what you would like to do.
You can find the instructions here if you'd like to give it a try: https://ocrmypdf.readthedocs.io/en/latest/batch.html#watched-folders-with-docker
If that does work an an RPi I'd love to hear back so other folks can benefit as well :-)
I'm definitely open to a PR to add an ARM build to this container, too, if someone is familiar with the process and wants to do so, but realistically won't be able to create it myself in the near future.