schema icon indicating copy to clipboard operation
schema copied to clipboard

OCR correction attributes: CS, ILLS, DBTS

Open jpmoreux opened this issue 10 years ago • 2 comments

Use cases:

These String related attributes can be used to describe human based decisions/actions during the OCR text correction process: • ILLS (boolean, optional): specify if a word is illegible in the source document (and consequently can't be corrected). This status can be used: - during the production workflow (the control quality process needs to know if a specific word is part or not of the guaranteed text quality perimeter ; besides, this status informs that the provider made a manual task on the word) - by the viewing software: end users should be informed that some words are illegible in the source document itself (it's not an OCR error...)

DBTS (boolean, optional): specify that a word has been corrected but a doubt remains. Same use cases. • These two attributes are part of the "production family" attributes, with CS (Correction Status), already defined by the schema.

Remarks: ILLS could be useful on the TextBlock/TextLine types too:

  • areas of the page with physical defaults: stains, blur, etc.
  • areas of the page with scan defaults: curvature near the binding, missing blocks near the margins, etc.

These attributes must be defined with a recommendation: always use the highest level possible to set the attribute (ie: do not set an attribute on all the sub-elements).

Examples:

<String ID="PAG_00000001_ST000029" STYLEREFS="TXT_1" HPOS="3413" VPOS="296" HEIGHT="448" WIDTH="992" WC="0.34" ILLS="true" CONTENT="AnfûràoII"/>

<String ID="PAG_00000001_ST000029" STYLEREFS="TXT_1" HPOS="3413" VPOS="296" HEIGHT="448" WIDTH="992" WC="0.34" DBTS="true" CONTENT="droits"/> 

Schema change:

<xsd:attribute name="ILLS" type="xsd:boolean" use="optional"> 
 <xsd:annotation > 
  <xsd:documentation>The word is illegible in the source document and can't be manually corrected. If the content owner thinks the word is legible, the attribute must be dropped (ILLS="false" is not recommended)< /xsd:documentation  > 
 </xsd:annotation  > 
</xsd:attribute>
<xsd:attribute name="DBTS" type="xsd:boolean" use="optional">  
 <xsd:annotation >
   <xsd:documentation>The word has been manually corrected but a doubt remains. If the content owner thinks the doubt is not legimitate, the attribute must be dropped  (DBTS="false" is not recommended).< /xsd:documentation   >  
 </xsd:annotation >
</xsd:attribute> 

jpmoreux avatar Jun 17 '14 14:06 jpmoreux