schema
schema copied to clipboard
Allow a String to contain alternative Glyph segmentation hypotheses
One of the most inherently difficult OCR tasks is segmenting a String into Glyphs. Because of ink or wearing problems, two glyphs can be merged on the page without any separating white space, or a single glyph can be split by white space.
As a developer of OCR software, I would like to be able to output alternative splits for a single String, with confidence attached to each split.
Alto currently provides no way of outputting these alternatives. The existing ALTERNATIVEType
and VariantType
are not sufficient, because they only allow to express alternative content, not splits.
One way to attain this would be:
<xsd:complexType name="StringType" mixed="false">
<xsd:sequence minOccurs="0">
...
<xsd:element name="StringVariant" type="StringType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
...
</xsd:complexType>
This however would make it possible to define a different HPOS
, VPOS
, HEIGHT
and WIDTH
for the String, which is not desired.
Another approach would be:
<xsd:complexType name="StringType" mixed="false">
<xsd:sequence minOccurs="0">
...
<xsd:element name="StringVariant" type="StringVariantType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
...
</xsd:complexType>
<xsd:complexType name="StringVariantType" mixed="false">
<xsd:sequence minOccurs="0">
<xsd:element name="Glyph" type="GlyphType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="WC" type="WCType" use="optional"/>
</xsd:complexType>
Yet a third way would be to extend the existing ALTERNATIVEType
to include confidence and glyphs:
<xsd:complexType name="ALTERNATIVEType" mixed="false">
...
<xsd:sequence minOccurs="0">
<xsd:element name="Glyph" type="GlyphType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="WC" type="WCType" use="optional"/>
</xsd:complexType>
However, this implies a redefinition of ALTERNATIVEType
, which is currently expressed as a variant of writing by new typing / spelling rules
.