pdf2htmlEX icon indicating copy to clipboard operation
pdf2htmlEX copied to clipboard

Generated html page not working correctly with "Text fragments" in URL for multi word fragments

Open jmozmoz opened this issue 1 year ago • 1 comments

Newer browsers support so called text fragments in URLs to select a text on a newly opened page. See:

  • https://wicg.github.io/scroll-to-text-fragment/
  • https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments

This works also with html pages generated by pdf2htmlEX, but only if only one word is used. Example:

  • https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=statically

But it does not work, if more than one word is used, e.g.:

  • https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=statically%20typed

This works for an "original" html page, e.g.:

  • https://www.mozilla.org/en-US/#:~:text=we%20create%20prioritize

Should this work with the pages generated by pdf2htmlEX? Does the URL needed to formulated in another way? Or is there a principle reason, why this cannot work?

I tested it with Chrome 130.0.6723.70 and Edge 130.0.2849.68.

jmozmoz avatar Nov 06 '24 14:11 jmozmoz

Chrome's implementation of text fragments only work for single words separated by a comma (so https://www.mozilla.org/en-US/#:~:text=tech,that ) works, and also https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=statically,typed works .

Note the "worse" problem is that all non-visible pages are hidden by the pdf2htmlEX javascript (for good reason, much faster initial rendering), so that ctrl-f (and text fragments) won't find any text on anything but the initially-visible page. I've found a hack for the javascript is to replace all instances of hide() with show() and adding an early-return to the pre_hide_pages function e.g. pre_hide_pages:function(){return; ... . For instance, https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=underway,areas will only highlight the text if you're already viewing section 10, and ctrl-f "underway" won't work if you're viewing page 1. I've found the HTML attribute hidden="until-found" may be what we want to be using for hiding pages instead of display:none

a2intl avatar Dec 06 '25 07:12 a2intl