ocr-fileformat
ocr-fileformat copied to clipboard
PAGE format extension in Transkribus
https://github.com/Transkribus/TranskribusPageformat/blob/master/pagecontent_extension.xsd
Is this extension of the PAGE format still relevant? I see that the latest changes in this repo are from March 2017.
Is this extension of the PAGE format still relevant? I see that the latest changes in this repo are from March 2017.
It is, but IIUC the schema has moved to https://gitlab.com/readcoop/transkribus/TranskribusCore/-/blob/master/src/main/resources/xsd/pagecontent_extension.xsd
We could now support it as transkribus by adding a script based on https://github.com/kba/transkribus-to-prima
Yes, something like ocr-transform transkribus-page page2019 for ocr{d_,-}fileformat will be next steps to properly integrate this into OCR-D.
Maybe page-transkribus? Or page-readcoop? Or page2013t?
I think the last one fits best in our current naming schema. And if we add a one line description for each supported format that could explain the "t".
It's 2022 and we are not short of bytes any longer: I recommend against such abbreviations.
Also: page2013transkribus
We already have cryptical short names like gcv (that's why I suggested to extend the help message), but page2013transkribus is fine for me, too.