Clemens Neudecker
Clemens Neudecker
@bertsky fyi, within ALTO we are currently investigating CITlab's [confidence matrices](https://github.com/CITlabRostock/CITlabConfMat) (or ``ConfMats``) in view of possible lattice support. Are you perhaps familiar with ``ConfMat`` and how it relates to...
@bertsky Thanks for the elaboration. Initial discussions were held in the ALTO board meeting alongside DATeCH2019 last week but as soon as we have something public, I'll post the link...
Wait, what are "suspiciously small" regions? Will this not get hairy fast with heuristics based on dimensions? What about e.g. thin separator lines or punctuation marks?
@bertsky Thanks, I've updated the titel accordingly. Anyway for all "validations" that are not directly related to violations of the PAGE schema I would expect a ``warning`` or ``suspicious`` flag...
Use of LC xlink instead of w3c xlink fails in mixed validation. Mixup/clashes in schema definitions.
> Based on last board meeting, in case the final decision would be to completely remove xLink references we should announce in schema 4.4 documentation this intention (mark it as...
Use of LC xlink instead of w3c xlink fails in mixed validation. Mixup/clashes in schema definitions.
ACCEPT
ACCEPT
@chris1010010 This is great for a head start, many thanks! I will also circulate this within the @OCR-D community for comments and contributions.
FYI there is also ongoing work in the German OCR SIG to complete what Christian started, cf. https://github.com/maxnth/page-alto-ressources and https://github.com/maxnth/prima-core-libs/branches
Related to this see the normalization of OCR coordinates in TEI at the BSB (from page 30 onwards - unfortunately only in German but I think you get the gist):...