client icon indicating copy to clipboard operation
client copied to clipboard

No space between words when annotating some PDFs

Open diegodlh opened this issue 4 years ago • 4 comments

Steps to reproduce

  1. Open PDF https://github.com/mozilla/pdf.js/files/2955999/1.pdf in the browser (Firefox or Chrome).
  2. Launch the Hypothes.is client.
  3. Select a few words to annotate, either in the middle of a line, or from one line to the next

Expected behaviour

Spaces between words should be observed in the annotation excerpt. For example: "Open Sans is a humanist sans serif typeface"

Actual behaviour

Instead, spaces between words are lost: "OpenSansisahumanistsansseriftypeface"

Browser/system information

This is happening both with Firefox native's PDF viewer and Hypothesis' bookmarklet, and with Chrome's Hypothesis extension.

Additional details

This seems to be a well known issue with PDF.js. See issues https://github.com/mozilla/pdf.js/issues/10640, https://github.com/mozilla/pdf.js/issues/6657, https://github.com/mozilla/pdf.js/issues/10110.

I have found several PDFs like this. Cases where the space at the end of a line is lost seem more common (https://github.com/mozilla/pdf.js/issues/10110).

diegodlh avatar Apr 18 '20 03:04 diegodlh

As expected, same thing happens when opening a file through Via. See for example here: https://via.hypothes.is/https://gahp.net/wp-content/uploads/2017/09/sample.pdf. Try to annotate "Actually you should do it" at the end of page 6.

diegodlh avatar Apr 18 '20 03:04 diegodlh

The situation should have improved greatly in pdf.js since https://github.com/mozilla/pdf.js/pull/13257.

marco-c avatar May 04 '21 10:05 marco-c

Thanks for the notice! We might have some work to do when we upgrade PDF.js to ensure that annotations don't orphan in documents that were heavily affected by these spacing issues.

robertknight avatar May 04 '21 12:05 robertknight

The situation is indeed improved in the latest version of PDF.js. I think we can consider this issue fixed when:

  • https://github.com/hypothesis/client/pull/3687 is resolved
  • The browser extension's version of PDF.js has been updated. See https://github.com/hypothesis/browser-extension/pull/631.
  • Via's version of PDF.js has been updated.

robertknight avatar Aug 23 '21 17:08 robertknight