extraction-framework icon indicating copy to clipboard operation
extraction-framework copied to clipboard

en.dbpedia.org instead of dbpedia.org ?

Open VladimirAlexiev opened this issue 8 years ago • 5 comments

http://mappings.dbpedia.org/server/extraction/en/extract?title=Great_Britain_men%27s_national_basketball_team&format=turtle-triples&extractors=custom makes triples with en.dbpedia.org (which does not resolve) instead of dbpedia.org, eg:

http://en.dbpedia.org/resource/Great_Britain_men's_national_basketball_team (as subject) and http://en.dbpedia.org/resource/British_Basketball (as object).

So at least the extraction sampler is broken in this regard. But I suspect that production data is also broken, because http://dbpedia.org/resource/Great_Britain_men%27s_national_basketball_team returns nothing. (Yes, there is a page https://en.wikipedia.org/wiki/Great_Britain_men%27s_national_basketball_team, and it existed for a few years)

VladimirAlexiev avatar Mar 21 '16 08:03 VladimirAlexiev

The same holds of raw props: the above includes http://en.dbpedia.org/property/ instead of http://dbpedia.org/property/

VladimirAlexiev avatar Mar 21 '16 08:03 VladimirAlexiev

actually, dbpedia.org is the exception to all rules since I18n was actively enabled :) the same way we have fr.dbpedia.org from fr.wikipedia.org we should also have en.dbpedia.org but it was too late to change that and many applications would break if we did.

So the whole framework uses this lang convention but for en we have a special rule at the end of the extraction pipeline that replaces en.dbpedia.org to dbpedia.org

It was not easy to put this processing in all extraction ouputs so the extraction sampler is like this for the last few years.

We can either close this or leave it open in case it is picked up as a gsoc warm up tasks

jimkont avatar Mar 21 '16 08:03 jimkont

Please keep it at least until it's explained why http://dbpedia.org/page/Great_Britain_men's_national_basketball_team is missing, yet it's returned by this query: select * {?country a dbo:Country}

VladimirAlexiev avatar Mar 21 '16 09:03 VladimirAlexiev

this is a different issue. @pkleef is this related to the new 2015-10 version? I see the data are not yet deployed in dbpedia.org but maybe the code from the adjusted vad did

jimkont avatar Mar 21 '16 09:03 jimkont

@VladimirAlexiev took a closer look and the dbo:Country triple comes from ST-Types provided by @HeikoPaulheim and is duplicate of #241 and #414

regarding the display of http://dbpedia.org/page/Great_Britain_men's_national_basketball_team, if you do a DESCRIBE it works fine

jimkont avatar Mar 21 '16 09:03 jimkont