Giant Text Selection layer for Table of Contents page
Description
The text layer for certain table of content pages in books are showing as giant text, which is made visible when the translate plugin is turned on and shows the text layers overlapping and oversized. This is likely a pre-existing issue within the text selection plugin.
Evidence / Screenshot (if possible)
https://archive.org/details/illustratedbooko00robe/page/n11/mode/2up
The screenshot below has made adjustments to the BookReader css to show the odd location of the text "XIV. — The Barb ......"
Expectation
The text layer with translations enabled should result in the text layer closely matching the size and location of the text content on the page
Context
This issue has been observed in Windows 10 and 11 for Chrome and Firefox
Stakeholders
@schu96 @cdrini
Looking into this issue, it appears that this originates upstream from the OCR process, where the left and top is reported to be 0px.
The djvu.xml file associated with this particular work shows the following values
<PARAGRAPH>
<LINE>
<WORD coords="0,2295,338,0" x-confidence="95">XIII. </WORD>
<WORD coords="344,2286,390,2281" x-confidence="0">—</WORD>
<WORD coords="391,2295,503,2263" x-confidence="63">The </WORD>
<WORD coords="504,2296,804,2264" x-confidence="57">Dragoon </WORD>
<WORD coords="805,2297,1426,2288" x-confidence="33">.....</WORD>
</LINE>
</PARAGRAPH>
Which is interpreted in the bookreader as ocrLeft 0px, ocrBottom 2295px, ocrRight 338px, and ocrTop 0px
A simple approach to this issue is to completely remove and render words that do not have 0px values for ocrLeft and ocrTop, which resolves the sizing and position issue for both the text selection and translated text layer
Closed by #1460