extraction-framework
extraction-framework copied to clipboard
en.dbpedia.org instead of dbpedia.org ?
http://mappings.dbpedia.org/server/extraction/en/extract?title=Great_Britain_men%27s_national_basketball_team&format=turtle-triples&extractors=custom makes triples with en.dbpedia.org (which does not resolve) instead of dbpedia.org, eg:
http://en.dbpedia.org/resource/Great_Britain_men's_national_basketball_team (as subject) and http://en.dbpedia.org/resource/British_Basketball (as object).
So at least the extraction sampler is broken in this regard. But I suspect that production data is also broken, because http://dbpedia.org/resource/Great_Britain_men%27s_national_basketball_team returns nothing. (Yes, there is a page https://en.wikipedia.org/wiki/Great_Britain_men%27s_national_basketball_team, and it existed for a few years)
The same holds of raw props: the above includes http://en.dbpedia.org/property/ instead of http://dbpedia.org/property/
actually, dbpedia.org
is the exception to all rules since I18n was actively enabled :)
the same way we have fr.dbpedia.org
from fr.wikipedia.org
we should also have en.dbpedia.org
but it was too late to change that and many applications would break if we did.
So the whole framework uses this lang convention but for en
we have a special rule at the end of the extraction pipeline that replaces en.dbpedia.org
to dbpedia.org
It was not easy to put this processing in all extraction ouputs so the extraction sampler is like this for the last few years.
We can either close this or leave it open in case it is picked up as a gsoc warm up tasks
Please keep it at least until it's explained why http://dbpedia.org/page/Great_Britain_men's_national_basketball_team is missing, yet it's returned by this query:
select * {?country a dbo:Country}
this is a different issue. @pkleef is this related to the new 2015-10 version? I see the data are not yet deployed in dbpedia.org but maybe the code from the adjusted vad did
@VladimirAlexiev took a closer look and the dbo:Country triple comes from ST-Types provided by @HeikoPaulheim and is duplicate of #241 and #414
regarding the display of http://dbpedia.org/page/Great_Britain_men's_national_basketball_team, if you do a DESCRIBE it works fine