extraction-framework
extraction-framework copied to clipboard
The software used to extract structured data from Wikipedia
Hi, I've encountered a bug in dbpedia service, dates between 0-99 A.D are mapped to 19(0-99), example here: http://dbpedia.org/page/Nero I haven't dug into dbpedia code, but I'm assuming that this...
I have a few concerns about [citationIri](https://github.com/dbpedia/extraction-framework/blob/807d7bc8fd825da8e404e4d8050d9c6ae3207b0d/core/src/main/scala/org/dbpedia/extraction/mappings/CitationExtractor.scala#L106). It's trying to make a URL for the citation from its properties: 1. @jimkont please confirm that even though it's a `for` loop,...
See http://mappings.dbpedia.org/server/extraction/sr/extract?revid=19189789&format=trix&extractors=custom the extraction framework outputs the following iri for this resource http://sr.dbpedia.org/resource/Project_talk:Администраторска_табла however the actual namespace (and wikipedia article) name is https://sr.wikipedia.org/wiki/Разговор_о_Википедији:Администраторска_табла through http://sr.wikipedia.org/wiki/Project_talk:Администраторска_табла you still get redirected to...
- Feeder reads 5000 records and puts them in queue - these get extracted, but seems like they don't get updated in cache db - next request it gets the...
My issue is pretty much the same as the first problem described in https://github.com/dbpedia/extraction-framework/issues/556 - during extraction of the (German) wikipedia dump a lot of `Tried to convert inconvertible unit`...
Running rapper over the changesets at http://downloads.dbpedia.org/live/changesets/2019/ Rapper log: http://95.217.42.166/rapper-changesets-2019.bz2 `find changesets/2019 | grep 'nt.gz$' | xargs zcat | rapper -i ntriples -c - http://base.org 2>&1 | lbzip2 -zc >...
Performing the next query to dbpedia: ```sparql PREFIX dbo: PREFIX dbr: PREFIX foaf: SELECT ?country ?label ?longName ?name WHERE { ?country a dbo:Country. ?country dbo:capital ?capital. ?country rdfs:label ?label ....
While converting the `nif-text-links_lang=en.ttl` from RDF to HDT using https://github.com/rdfhdt/hdt-cpp/tree/develop/libhdt I get following error: > error: /data/milan/nif-text-links_lang=en.ttl:7388119:282: invalid IRI escape `nif-text-links_lang=en.ttl ` comes from https://databus.dbpedia.org/marvin/text/nif-text-links/ version `2020.02.01` The problem is...
https://en.wikipedia.org/wiki/The_Ren_%26_Stimpy_Show is encoded as: https://dbpedia.org/resource/The_Ren_&_Stimpy_Show check: `curl http://dbpedia-mappings.tib.eu/release/mappings/mappingbased-literals/2019.06.01/mappingbased-literals_lang=en.ttl.bz2 | bzcat | cut -f1 -d '>' | grep '&'` on https://databus.dbpedia.org/marvin/mappings/mappingbased-literals/2019.06.01
https://github.com/dbpedia/extraction-framework/blob/live-deployed/live/src/main/java/org/dbpedia/extraction/live/feeder/EventStreamsFeeder.java if AKKA stream fails, the initial time is used, not latestProcessDate, cascading in maxLine exceeded Illegal state exception. There was an attempt to fix this, but it is unclear...