extraction-framework icon indicating copy to clipboard operation
extraction-framework copied to clipboard

Mixed rdfs:labels for many chemical compounds

Open rogargon opened this issue 3 years ago • 2 comments

Issue validity

The version is currently available from https://dbpedia.org/sparql

Error Description

Many chemical compounds seem to have their labels mixed among them for languages different from English (es, fr, ar,...). For instance, for http://dbpedia.org/resource/Cholesterol there are more than 900 labels in Spanish, including many clearly not corresponding to it like: "Cocaina"...

Pinpointing the source of the error

  • SPARQL endpoint http://dbpedia.org/sparql

Details

Using the following query, many resources with more than 900 labels in Spanish are detected:

SELECT  ?concept (COUNT(?label) AS ?count)
FROM <http://dbpedia.org>
WHERE {
  ?concept rdfs:label ?label
  FILTER(LANG(?label) = 'es')
} GROUP BY ?concept
HAVING (COUNT(?label) > 900)

Example DBpedia resource URL(s)

http://dbpedia.org/resource/Cholesterol

Other

Reducing the threshold to more than 100 labels, many other kinds of resources (including people) are also present. They seem also incorrect, like: https://dbpedia.org/page/Alexandra_of_Denmark

rogargon avatar Jan 30 '22 15:01 rogargon

How can it be resolved?

ritikBhandari avatar May 30 '22 06:05 ritikBhandari

This is an example of a corruption that entered the release-workflow at some point in the recent past. We've also seen chemical label problem. In an earlier release, both a synonym and language label were more accurate than recent releases. Similarly, we reported image corruption. While some problems have been corrected, many images are just plain wrong. Again, these problems did not exist in earlier releases, but unfortunately I don't have screen shots of correct-data that I can contrast with incorrect-data. The bottom line: the quality of DBpedia data has degraded. New releases may have more items, but the fidelity of older items has been degraded during transitions. How can we help restore higher quality data from previous releases?

jaygray0919 avatar May 30 '22 14:05 jaygray0919