client
client copied to clipboard
No space between words when annotating some PDFs
Steps to reproduce
- Open PDF https://github.com/mozilla/pdf.js/files/2955999/1.pdf in the browser (Firefox or Chrome).
- Launch the Hypothes.is client.
- Select a few words to annotate, either in the middle of a line, or from one line to the next
Expected behaviour
Spaces between words should be observed in the annotation excerpt. For example: "Open Sans is a humanist sans serif typeface"
Actual behaviour
Instead, spaces between words are lost: "OpenSansisahumanistsansseriftypeface"
Browser/system information
This is happening both with Firefox native's PDF viewer and Hypothesis' bookmarklet, and with Chrome's Hypothesis extension.
Additional details
This seems to be a well known issue with PDF.js. See issues https://github.com/mozilla/pdf.js/issues/10640, https://github.com/mozilla/pdf.js/issues/6657, https://github.com/mozilla/pdf.js/issues/10110.
I have found several PDFs like this. Cases where the space at the end of a line is lost seem more common (https://github.com/mozilla/pdf.js/issues/10110).
As expected, same thing happens when opening a file through Via. See for example here: https://via.hypothes.is/https://gahp.net/wp-content/uploads/2017/09/sample.pdf. Try to annotate "Actually you should do it" at the end of page 6.
The situation should have improved greatly in pdf.js since https://github.com/mozilla/pdf.js/pull/13257.
Thanks for the notice! We might have some work to do when we upgrade PDF.js to ensure that annotations don't orphan in documents that were heavily affected by these spacing issues.
The situation is indeed improved in the latest version of PDF.js. I think we can consider this issue fixed when:
- https://github.com/hypothesis/client/pull/3687 is resolved
- The browser extension's version of PDF.js has been updated. See https://github.com/hypothesis/browser-extension/pull/631.
- Via's version of PDF.js has been updated.