Generated html page not working correctly with "Text fragments" in URL for multi word fragments
Newer browsers support so called text fragments in URLs to select a text on a newly opened page. See:
- https://wicg.github.io/scroll-to-text-fragment/
- https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments
This works also with html pages generated by pdf2htmlEX, but only if only one word is used. Example:
- https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=statically
But it does not work, if more than one word is used, e.g.:
- https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=statically%20typed
This works for an "original" html page, e.g.:
- https://www.mozilla.org/en-US/#:~:text=we%20create%20prioritize
Should this work with the pages generated by pdf2htmlEX? Does the URL needed to formulated in another way? Or is there a principle reason, why this cannot work?
I tested it with Chrome 130.0.6723.70 and Edge 130.0.2849.68.
Chrome's implementation of text fragments only work for single words separated by a comma (so https://www.mozilla.org/en-US/#:~:text=tech,that ) works, and also https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=statically,typed works .
Note the "worse" problem is that all non-visible pages are hidden by the pdf2htmlEX javascript (for good reason, much faster initial rendering), so that ctrl-f (and text fragments) won't find any text on anything but the initially-visible page. I've found a hack for the javascript is to replace all instances of hide() with show() and adding an early-return to the pre_hide_pages function e.g. pre_hide_pages:function(){return; ... . For instance, https://pdf2htmlex.github.io/pdf2htmlEX/demo/demo.html#:~:text=underway,areas will only highlight the text if you're already viewing section 10, and ctrl-f "underway" won't work if you're viewing page 1. I've found the HTML attribute hidden="until-found" may be what we want to be using for hiding pages instead of display:none