PAGE helper functions in recognize to generateDS API?
Now that the generateDS API has been refactored to make it easier to extend, IMHO it would be useful to have these functions available for all processors:
page_element_unicode0page_element_float0page_get_reading_orderpage_update_higher_testequiv_level
Agreed!
page_element_unicode0page_element_conf0
Maybe these could go as member functions get_Unicode0 and get_conf0 into GlyphType, WordType, TextLineType and TextRegionType.
page_get_reading_order
I use this a lot, but it could be better: When in ocrd_page_generateds, then the function should
- be named
get_reading_order_dictor similar (as member ofPageType) - include instantiating the first/top-level
dict - include referencing the top-level
get_ReadingOrder()and itsget_OrderedGroup()orget_UnorderedGroup()(all robust to empty results)
page_update_higher_testequiv_level
Maybe we could trigger this automatically whenever a TextEquiv gets added anywhere and/or before serialization. (In a similar spirit to planned automatic coordinate sanitation.)
Anyway, the version here is the most complete so far, but it could be simplified with the new API in core.
I should also mention: page_add_to_reading_order