core
core copied to clipboard
page validator: additional checks
trafficstars
Beyond actual (syntactic) schema violations ("validity") and conventional (semantic) problems ("inconsistency"), we might want to check for and repair additional issues:
- if
/PcGts/Page/ReadingOrderor any of its children is empty (in which case PageViewer fails to load) – as repaired by https://github.com/bertsky/workflow-configuration - if any
/PcGts/Page/ReadingOrder//@regionRefdoes not point to an existing segment identifier (in which case PageViewer fails to load) – as repaired by https://github.com/bertsky/workflow-configuration - if any
//TextEquivcontains neitherPlainTextnorUnicode(in which case PageViewer fails to load) – as repaired by https://github.com/bertsky/workflow-configuration - ...?
Also:
- if any
/PcGts/Page/ReadingOrder/(OrderedGroup|OrderedGroupIndexed)/@indexis not in order (or clashing) - if any
//TextEquiv/@indexis not in order (or clashing)