Suggestion to prevent volume/path breaking changes
From the 0.16.0 release notes:
OCR update to tesseract 5 from 4.00 (Requires volume path change) and OCRmyPDF update
I assume it refers to this:
- /location/of/trainingData:/usr/share/tesseract-ocr/4.00/tessdata
Changing into:
- /location/of/trainingData:/usr/share/tesseract-ocr/5/tessdata
My suggestion would be not requiring the version number in the docker config in order to prevent continuous breaking changes. Possibly you could set up a folder like /trainingData that hard links or something to where you need it (/usr/share/tesseract-ocr/5/tessdata in this case). This would require yet another breaking change but would prevent future ones. Also if you do it soon, then likely people only need to fix their configurations once.
So then people just mount
- /location/of/trainingData:/trainingData
I don't know how to do this or I'd submit a PR.
For now i added a script which copes /usr/share/tesseract-ocr/4.00/tessdata to /usr/share/tesseract-ocr/5/tessdata
So there shouldnt actually be a breaking change, but i wanted to say it to get people to copy things over anwyay
but i see your point, there should be a static location.. i will consider this!
Tesseract did this in their latest version long time back now, moving all to just /usr/share/tesseract