scholia
scholia copied to clipboard
OJS title extraction may differ between ":" and " - "
Describe the bug OJS title extraction may differ between ":" and " - ". Scraping https://tidsskrift.dk/samfundsokonomen/article/view/143319 should give a title with ":" according to metadata on the page, but the quickstatements generated contain a form of dash: "Civilsamfund og velfærdsstat – konflikt, samarbejde eller begge dele?"
This is also the case with https://tidsskrift.dk/samfundsokonomen/article/view/143319
It means that the article is not match, - it is already in Wikidata, e.g., https://scholia.toolforge.org/work/Q123561283
To Reproduce Steps to reproduce the behavior:
-
python -m scholia.scrape.ojs issue-url-to-quickstatements https://tidsskrift.dk/samfundsokonomen/issue/view/10780
-
LAST Len "Civilsamfund og velfærdsstat – konflikt, samarbejde eller begge dele?"
Expected behavior Should identify https://scholia.toolforge.org/work/Q123561283
how to know that (https://scholia.toolforge.org/work/Q123561283) is identified?
how to know that (https://scholia.toolforge.org/work/Q123561283) is identified?
It seem that the title is displayed/set differently on the OJS pages.
So it should be Civilsamfund og velfærdsstat: konflikt, samarbejde eller begge dele?
instead?, what the output should look like?, and if solved the identified articles should not have any entries in the output?, i assume if the article is identified it gonna be commented and put at the end of the output, i don't know.
Good question. The PDF has the dash which the meta information has a colon. It is unclear to me how I scrape the dash version...
Is this problem with all the articles that contains a dash or a colon?
Is this problem with all the articles that contains a dash or a colon?
No, I do not think so.