Helmut Wollmersdorfer

Results 18 issues of Helmut Wollmersdorfer

Compared original XML "ONB_newseye" to current line texts "AustrianNewspapers". ``` compare_xml.pl Version 0.01 Compare XML text output against ground truth (GRT): XML: ONB_newseye GRT: AustrianNewspapers Summary: lines words chars items...

Release 1.1.1

The line images in `gt/train` sometimes are too short, e. g. ``` ONB_ibn_19110701_018.tif_tl_6.gt.txt:###wertet. Das geringſte Gebot beträgt 10.020 Kronen. ONB_ibn_19110701_018.tif_tl_63.gt.txt:b###chten Utenſilien, über welche die öffentliche Ver⸗ ONB_ibn_19110701_018.tif_tl_64.gt.txt:###ßerung ausgeſchrieben wird. Offerte...

Release 1.1.1

E. g. training set `ONB_aze_18950706_1.xml`contains ``` ``` but there are only line files following the id pattern `tl_\d+`. Either we rename the line ids in the XML or use the...

Release 1.1.1

Shocked in the first moment I checked the history of git, if I deleted them by mistake. No, they never existed. E. g. ``` ONB_nfp_19110701_006.tif_tl_6.gt.txt ONB_nfp_19110701_006.tif_tl_6.png ``` In the XML...

Release 1.1.1

As already mentioned in issues #29 #28 #3 and #2 there are problems with the line images as they don't contain 1:1 the text of the corresponding `*.gt.txt` files. Problems...

Just for the records. There are some rotated characters in steps of 90 degrees. It happens often with n/u. In the binarised images of low quality and Fraktur the difference...

As I understand, updates and corrections are applied on both, the whole page XML and line text files. Will look, if a quick hack to update XML pays off. I...

There are some numbers in the original images, where the decimal dot is not sitting near the baseline. Either at the hight of the hyphen, or at the top edge...