dinglehopper icon indicating copy to clipboard operation
dinglehopper copied to clipboard

Warn if there is text missing in the ReadingOrder

Open mikegerber opened this issue 4 years ago • 1 comments

For 00451941.gt.xml, dinglehopper-extract does not extract the header's text DE L'ESPRIT DE L'HOMME.

mikegerber avatar May 21 '21 14:05 mikegerber

The header is in TextRegion r3, but the ReadingOrder only includes the main text in r1, so dinglehopper does only extract the main text. This means: The file is buggy, not dinglehopper.

However, we can do better by warning that any region is not included in the extracted text.

mikegerber avatar May 21 '21 14:05 mikegerber