joplin
joplin copied to clipboard
web clipper fails to import scientific webpages correctly
Operating system
Linux
Joplin version
3.5.7
Desktop version info
Joplin 3.5.7 (prod, linux) Web Clipper Version 2.11.2
Current behaviour
Joplin's Web Clipper often has problems when clipping scientific webpages that include footnotes as hyperlinks. As a result, the clipped pages are hard to read because the text becomes malformed. Many pages, especially from https://www.nature.com are affected.
Steps to reproduce
- open webpage (e.g. https://doi.org/10.1038/s41579-022-00818-6 or minimal example joplin-webclipper-problem.html
- clip simplified page
- preview note in Joplin
Problem
- text is imported with line breaks
- indent seems to lead to code block
- text becomes hard to read
- same in both editors
Example screenshot
Potential cause
- line break in source HTML could be responsible for wrong import
- hyperlink tag
<a></a>contains multi-line content
<a data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 9" title="Dai, X. et al. Slowdown of translational elongation in Escherichia coli under hyperosmotic stress. mBio
https://doi.org/10.1128/mBio.02375-17
(2018)." href="/articles/s41579-022-00818-6#ref-CR9" id="ref-link-section-d481596181e432">9</a>
Expected behaviour
- text should be shown according to webpage
- potentially fix by considering hyperlinks to have multiple lines during import?
Logs
No response