scholia icon indicating copy to clipboard operation
scholia copied to clipboard

Mathml markup in titles breaks importing article metadata

Open stuartyeates opened this issue 8 months ago • 6 comments

Describe the bug

scholar doesn't sanitize some characters in DOI metadata

To Reproduce Steps to reproduce the behavior:

  1. Go to https://scholia.toolforge.org/doi/10.21083/irss.v39i0.2977
  2. Click on "Submit"
  3. Click on "Run"
  4. See the red error

Expected behavior All characters are sanitized

stuartyeates avatar Apr 25 '25 22:04 stuartyeates

Another one that fails: https://scholia.toolforge.org/doi/10.1016/b978-0-44-315423-2.00006-0

stuartyeates avatar Apr 27 '25 09:04 stuartyeates

Also: https://scholia.toolforge.org/doi/10.1103/physrevd.111.l071102 https://scholia.toolforge.org/doi/10.1103/physrevd.111.032012 https://scholia.toolforge.org/doi/10.1103/physrevlett.134.011802

stuartyeates avatar Apr 28 '25 05:04 stuartyeates

It is a question whether the tags can be removed "< " tend to be ok, e.g., https://www.wikidata.org/wiki/Q100506723

But there is also https://www.wikidata.org/wiki/Q102060669 " Corrigendum to <Compensatory strategy between trunk-hip kinematics and reaction time following slip perturbation between subjects with and without chronic low back pain> < [Journal of Electromyography and Kinesiology 2018;43:68-74] >"

SELECT  
  ?work ?title
WHERE {
  ?work wdt:P1476 ?title .
  FILTER CONTAINS(?title, "< ")
} LIMIT 10

fnielsen avatar Apr 29 '25 10:04 fnielsen