joplin icon indicating copy to clipboard operation
joplin copied to clipboard

web clipper fails to import scientific webpages correctly

Open jotech opened this issue 4 months ago • 0 comments

Operating system

Linux

Joplin version

3.5.7

Desktop version info

Joplin 3.5.7 (prod, linux) Web Clipper Version 2.11.2

Current behaviour

Joplin's Web Clipper often has problems when clipping scientific webpages that include footnotes as hyperlinks. As a result, the clipped pages are hard to read because the text becomes malformed. Many pages, especially from https://www.nature.com are affected.

Steps to reproduce

  • open webpage (e.g. https://doi.org/10.1038/s41579-022-00818-6 or minimal example joplin-webclipper-problem.html
  • clip simplified page
  • preview note in Joplin

Problem

  • text is imported with line breaks
  • indent seems to lead to code block
  • text becomes hard to read
  • same in both editors

Example screenshot Image

Potential cause

  • line break in source HTML could be responsible for wrong import
  • hyperlink tag <a></a> contains multi-line content
<a data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 9" title="Dai, X. et al. Slowdown of translational elongation in Escherichia coli under hyperosmotic stress. mBio 
                  https://doi.org/10.1128/mBio.02375-17
                  
                 (2018)." href="/articles/s41579-022-00818-6#ref-CR9" id="ref-link-section-d481596181e432">9</a>

Expected behaviour

  • text should be shown according to webpage
  • potentially fix by considering hyperlinks to have multiple lines during import?
Image

Logs

No response

jotech avatar Dec 08 '25 12:12 jotech