ocr-fileformat icon indicating copy to clipboard operation
ocr-fileformat copied to clipboard

Feature request: Page concatenation during conversion

Open jsbien opened this issue 2 years ago • 0 comments

Transkribus (https://readcoop.eu/transkribus/?sc=Transkribus), which just reached 100 000 users, export PAGE and ALTO as a single file for every page and the actual page numbers are not stored in the files. In my workflow ALTO -> hOCR-> dsed I have to edit the page numbers in *.dsed files before using them as a valid djvused input (to use the transcription as the hidden text layer in a DjVu document). It would be nice to solve the problem in a general and elegant way.

jsbien avatar Nov 18 '22 09:11 jsbien