scholia icon indicating copy to clipboard operation
scholia copied to clipboard

OJS title extraction may differ between ":" and " - "

Open fnielsen opened this issue 1 year ago • 6 comments

Describe the bug OJS title extraction may differ between ":" and " - ". Scraping https://tidsskrift.dk/samfundsokonomen/article/view/143319 should give a title with ":" according to metadata on the page, but the quickstatements generated contain a form of dash: "Civilsamfund og velfærdsstat – konflikt, samarbejde eller begge dele?"

This is also the case with https://tidsskrift.dk/samfundsokonomen/article/view/143319

It means that the article is not match, - it is already in Wikidata, e.g., https://scholia.toolforge.org/work/Q123561283

To Reproduce Steps to reproduce the behavior:

  1. python -m scholia.scrape.ojs issue-url-to-quickstatements https://tidsskrift.dk/samfundsokonomen/issue/view/10780
  2. LAST Len "Civilsamfund og velfærdsstat – konflikt, samarbejde eller begge dele?"

Expected behavior Should identify https://scholia.toolforge.org/work/Q123561283

fnielsen avatar Feb 06 '24 14:02 fnielsen

how to know that (https://scholia.toolforge.org/work/Q123561283) is identified?

faresh9 avatar Feb 09 '24 09:02 faresh9

how to know that (https://scholia.toolforge.org/work/Q123561283) is identified?

It seem that the title is displayed/set differently on the OJS pages.

fnielsen avatar Feb 09 '24 09:02 fnielsen

So it should be Civilsamfund og velfærdsstat: konflikt, samarbejde eller begge dele? instead?, what the output should look like?, and if solved the identified articles should not have any entries in the output?, i assume if the article is identified it gonna be commented and put at the end of the output, i don't know.

faresh9 avatar Feb 09 '24 18:02 faresh9

Good question. The PDF has the dash which the meta information has a colon. It is unclear to me how I scrape the dash version...

fnielsen avatar Feb 12 '24 08:02 fnielsen

Is this problem with all the articles that contains a dash or a colon?

faresh9 avatar Feb 12 '24 16:02 faresh9

Is this problem with all the articles that contains a dash or a colon?

No, I do not think so.

fnielsen avatar Feb 12 '24 16:02 fnielsen