episciences icon indicating copy to clipboard operation
episciences copied to clipboard

[Bug Report] Abstract ignores line breaks

Open Gru-gru opened this issue 1 year ago • 1 comments

Describe the bug

When a paper is imported from arXiv, Episcience's displaying of the abstract can differ from that of arXiv, because line breaks are lost. Concrete example: https://theoretics.episciences.org/14397 vs https://arxiv.org/abs/2311.10204

Expected behavior

Line breaks should not be ignored, so that the abstract is shown as the authors intended.

Gru-gru avatar Oct 04 '24 08:10 Gru-gru

yes indeed, thank you for reporting this ; I'm adding notes to the bug report, we need to explore a few options to fix this.

This is what arXiv provides on the web: image a few line breaks <br>

This is what arXiv provides on the API: image The line length seems to end at 80 characters max, thus introducing unwanted line breaks

Source: http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:2311.10204&metadataPrefix=arXivRaw

And this is why we are not merely replacing line feeds \n with html line breaks <br>

the result would look like this: image

This is what the Datacite API provides: image

curl -s https://api.datacite.org/dois/10.48550/arXiv.2311.10204 |jq|grep '"description"' |grep --color '\\\n'

We can try to:

  • fix the line feeds added by the API
  • use the Datacite API, as a background task to update the abstracts

Let's ignore HTML Scraping.

rtournoy avatar Oct 04 '24 13:10 rtournoy

Sorry for the delay, I hope the display is better now, it is automatically applied to all articles. Please reopen if you find examples of incorrect or lost formatting.

Thanks for the bug report. We have also pushed other updates: see v1.0.52 - 2025-08-28 https://github.com/CCSDForge/episciences/blob/main/CHANGELOG.md

rtournoy avatar Aug 28 '25 23:08 rtournoy