extraction-framework
extraction-framework copied to clipboard
The software used to extract structured data from Wikipedia
Should we drop support for Table mappings in the framework? This didn't work as expected so far http://mappings.dbpedia.org/index.php?title=Special%3APrefixIndex&prefix=Table&namespace=204 http://mappings.dbpedia.org/index.php/How_to_edit_DBpedia_Mappings#How_to_map_a_Wikipedia_Table
We need to create a test for validating contributions to the server ignore list and enable tests on the server module (only for now) related PRs that caused problem: https://github.com/dbpedia/extraction-framework/pull/331...
In the mapping server we already provide statistics about the template & template property mapping coverage. A great addition would be an estimated class & property instance count based on...
I've just written unit tests for the FlagTemplateParser and many of them miserably fail. Here are things not working as advertised: - {{flag|...}} template with a country code as 1st...
This would facilitate working with for example the lookup code.
Each time someone changes an ignore list through the browser, it is [saved on the server](/dbpedia/extraction-framework/blob/master/server/src/main/scala/org/dbpedia/extraction/server/stats/IgnoreList.scala#L101) in the location where it was checked out from the repo. Then it should...
Stub categories seem to be implemented most of the time through transclusion of templates. Eg the article https://en.wikipedia.org/w/index.php?title=Şıra&action=edit has: ``` {{Turkey-cuisine-stub}} {{nonalcoholic-drink-stub}} ``` The fact that something is a stub...
http://mappings.dbpedia.org/server/extraction/en/extract?title=Great_Britain_men%27s_national_basketball_team&format=turtle-triples&extractors=custom makes triples with en.dbpedia.org (which does not resolve) instead of dbpedia.org, eg: http://en.dbpedia.org/resource/Great_Britain_men's_national_basketball_team (as subject) and http://en.dbpedia.org/resource/British_Basketball (as object). So at least the extraction sampler is broken in this...
(A simple warm-up task) See eg http://mappings.dbpedia.org/server/extraction/en/extract?title=Great_Britain_men%27s_national_basketball_team&format=turtle-triples&extractors=custom . It's very hard to read because it doesn't use prefixes. Add as many common prefixes as possible, so the listing is more...
As noted on the mailing list by Andy Mabbett, The English-Wikipedia community has decided to deprecate Persondata: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#RfC:_Should_Persondata_template_be_deprecated_and_methodically_removed_from_articles.3F aka https://goo.gl/ie8yed (page section will be archived shortly) In future, such data...