ocr-fileformat icon indicating copy to clipboard operation
ocr-fileformat copied to clipboard

ABBYY2Alto

Open zuphilip opened this issue 8 years ago • 13 comments

https://github.com/ironymark/AbbyyToAlto, Transformation with php, GPL v3

zuphilip avatar May 14 '16 10:05 zuphilip

Yes, I've seen it but I very much prefer a declarative transformation in XSLT that has no possible side effects and is easier to test. Maybe we can convert it to XSLT?

kba avatar May 14 '16 17:05 kba

Yes, it would be preferable to use a XSLT for the transformation.

zuphilip avatar May 14 '16 17:05 zuphilip

There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto

zuphilip avatar Jan 01 '20 09:01 zuphilip

How does that compare with https://github.com/PRImA-Research-Lab/prima-page-converter @maxnth

kba avatar Jan 02 '20 10:01 kba

There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto

That source code includes at least one copyrighted ~xsl~ file.

stweil avatar Jun 23 '22 15:06 stweil

There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto

That source code includes at least one copyrighted xsl file.

It does? I only saw that they include the copyrighted schema for Abbyy 10. We could ask ABBYY for a license to redistribute or omit that file and use the make vendor mechanism.

kba avatar Jun 23 '22 16:06 kba

How does that compare with https://github.com/PRImA-Research-Lab/prima-page-converter @maxnth

I had problems with prima-page-converter (going to open a bug report), while Mewel/abbyy-to-alto worked right away.

mikegerber avatar Jun 23 '22 16:06 mikegerber

they include the copyrighted schema for Abbyy 10

Yes, sorry, that was the one which I meant.

stweil avatar Jun 23 '22 17:06 stweil

I had problems with prima-page-converter (going to open a bug report),

https://github.com/PRImA-Research-Lab/prima-page-viewer/issues/24 - I opened the issue against prima-page-viewer as it is affected, too.

mikegerber avatar Jun 24 '22 14:06 mikegerber

while Mewel/abbyy-to-alto worked right away.

Sort of - it does not produce Processing tags (or the ALTO v2 equivalent), so it is lacking too.

mikegerber avatar Jun 24 '22 14:06 mikegerber

There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto That source code includes at least one copyrighted xsl file. It does? I only saw that they include the copyrighted schema for Abbyy 10. We could ask ABBYY for a license to redistribute or omit that file and use the make vendor mechanism.

I'd also like to point out that prima-page-converter has a similiar problem: the PrimaText library is not open source https://github.com/PRImA-Research-Lab/prima-page-converter/issues/17#issuecomment-769817720

mikegerber avatar Jun 24 '22 14:06 mikegerber

Somehow related: I just found a converter from ABBYY to hOCR made by the Internet Archive. No own tests done so far.

stweil avatar Jun 26 '22 06:06 stweil

while Mewel/abbyy-to-alto worked right away. Sort of - it does not produce Processing tags (or the ALTO v2 equivalent), so it is lacking too.

I've added that in https://github.com/Mewel/abbyy-to-alto/pull/16.

mikegerber avatar Jun 27 '22 12:06 mikegerber