ocr-fileformat icon indicating copy to clipboard operation
ocr-fileformat copied to clipboard

PAGE format extension in Transkribus

Open kba opened this issue 9 years ago • 6 comments
trafficstars

https://github.com/Transkribus/TranskribusPageformat/blob/master/pagecontent_extension.xsd

kba avatar Sep 06 '16 15:09 kba

Is this extension of the PAGE format still relevant? I see that the latest changes in this repo are from March 2017.

zuphilip avatar Dec 30 '19 12:12 zuphilip

Is this extension of the PAGE format still relevant? I see that the latest changes in this repo are from March 2017.

It is, but IIUC the schema has moved to https://gitlab.com/readcoop/transkribus/TranskribusCore/-/blob/master/src/main/resources/xsd/pagecontent_extension.xsd

We could now support it as transkribus by adding a script based on https://github.com/kba/transkribus-to-prima

bertsky avatar Jan 12 '22 17:01 bertsky

Yes, something like ocr-transform transkribus-page page2019 for ocr{d_,-}fileformat will be next steps to properly integrate this into OCR-D.

kba avatar Jan 12 '22 17:01 kba

Maybe page-transkribus? Or page-readcoop? Or page2013t?

I think the last one fits best in our current naming schema. And if we add a one line description for each supported format that could explain the "t".

stweil avatar Jan 12 '22 17:01 stweil

It's 2022 and we are not short of bytes any longer: I recommend against such abbreviations.

Also: page2013transkribus

bertsky avatar Jan 12 '22 17:01 bertsky

We already have cryptical short names like gcv (that's why I suggested to extend the help message), but page2013transkribus is fine for me, too.

stweil avatar Jan 12 '22 18:01 stweil