bookreader icon indicating copy to clipboard operation
bookreader copied to clipboard

Giant Text Selection layer for Table of Contents page

Open schu96 opened this issue 1 month ago • 1 comments

Description

The text layer for certain table of content pages in books are showing as giant text, which is made visible when the translate plugin is turned on and shows the text layers overlapping and oversized. This is likely a pre-existing issue within the text selection plugin.

Evidence / Screenshot (if possible) https://archive.org/details/illustratedbooko00robe/page/n11/mode/2up Image

The screenshot below has made adjustments to the BookReader css to show the odd location of the text "XIV. — The Barb ......" Image

Expectation

The text layer with translations enabled should result in the text layer closely matching the size and location of the text content on the page

Context

This issue has been observed in Windows 10 and 11 for Chrome and Firefox

Stakeholders

@schu96 @cdrini

schu96 avatar Nov 20 '25 01:11 schu96

Looking into this issue, it appears that this originates upstream from the OCR process, where the left and top is reported to be 0px.

The djvu.xml file associated with this particular work shows the following values

<PARAGRAPH>
  <LINE>
    <WORD coords="0,2295,338,0" x-confidence="95">XIII. </WORD>
    <WORD coords="344,2286,390,2281" x-confidence="0">—</WORD>
    <WORD coords="391,2295,503,2263" x-confidence="63">The </WORD>
    <WORD coords="504,2296,804,2264" x-confidence="57">Dragoon </WORD>
    <WORD coords="805,2297,1426,2288" x-confidence="33">.....</WORD>
  </LINE>
</PARAGRAPH>

Which is interpreted in the bookreader as ocrLeft 0px, ocrBottom 2295px, ocrRight 338px, and ocrTop 0px

A simple approach to this issue is to completely remove and render words that do not have 0px values for ocrLeft and ocrTop, which resolves the sizing and position issue for both the text selection and translated text layer

Image Image

schu96 avatar Dec 03 '25 05:12 schu96

Closed by #1460

cdrini avatar Dec 16 '25 21:12 cdrini