extraction-framework
extraction-framework copied to clipboard
The software used to extract structured data from Wikipedia
# Issue validity The version is currently available from https://dbpedia.org/sparql # Error Description Many chemical compounds seem to have their labels mixed among them for languages different from English (es,...
Bumps [gson](https://github.com/google/gson) from 2.2.2 to 2.8.9. Release notes Sourced from gson's releases. Gson 2.8.9 Make OSGi bundle's dependency on sun.misc optional (#1993). Deprecate Gson.excluder() exposing internal Excluder class (#1986). Prevent...
collection of stuff which was improved/fixed in template test branch currency conversion fix https://github.com/dbpedia/extraction-framework/issues/582 CombineSimpleMapping https://github.com/dbpedia/extraction-framework/issues/556 https://github.com/dbpedia/extraction-framework/issues/565 https://github.com/dbpedia/extraction-framework/issues/552 template transformation and configuration list expansion json provenance and debugging format dateinterval...
The latest-core collection at https://databus.dbpedia.org/dbpedia/collections/latest-core as downloaded on January 28, 2022 has many "Bad IRI" and "Illegal character in IRI" issues across the data as reported by Apache Jena's `riot...
Three extractors: .nifExtractor .abstractExtractor .abstractExtractorWikipedia - all three produce mostly the same, except for some, i.e. Joe Biden - unclear whether they use the wikidump or mediawiki api - overall...
# Issue validity > Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes...
# Issue validity > Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes...
# Issue validity Live data on dbpedia.org. # Error Description There is a `http://`/`https://` mismatch between requested URIs and the URIs in the data. # Details Originally reported here: https://sourceforge.net/p/dbpedia/mailman/message/37362683/...
# Issue validity As explained here : https://forum.dbpedia.org/t/commons-ressources-extractor-problem/1485 I got an issues concerning the commons links from a wikipedia page in French. # Error Description > Please state the nature...
Line #66650918 in https://downloads.dbpedia.org/repo/dbpedia/wikidata/sameas-all-wikis/2020.03.01/sameas-all-wikis.ttl.bz2 with Subject `http://wikidata.dbpedia.org/resource/Q9398047` Contains the [U+FFFC](https://en.wiktionary.org/wiki/%EF%BF%BC) unicode character in the Object which is not a valid IRI according to https://tools.ietf.org/html/rfc3987#section-2.2 By the way, this causes parsing...