extraction-framework
extraction-framework copied to clipboard
The software used to extract structured data from Wikipedia
see here for details: http://sourceforge.net/mailarchive/forum.php?thread_name=17558.147.91.1.44.1385999403.squirrel%40mail.imp.bg.ac.rs&forum_name=dbpedia-developers
Hi, I use the dbpedia extraction framework to extract link and category information. I have this: ``` val source = XMLSource.fromFile(new File("enwiki-latest-pages-articles.xml"), Language.English) source.toIterable .zipWithIndex .map { page: WikiPage =>...
The Live module is using mainly Exception.getMessage to log exceptions. As per the Javadoc the message can be null leading to useless log lines. Would be better to use Exception.toString...
See her entry here: http://en.wikipedia.org/wiki/Theodora_(wife_of_Justinian_I) It reads both the places: ``` . . ``` But the dates, which are in the info box, are not read. I would guess the...
See: http://live.dbpedia.org/page/Isaac_Newton The birth date in dbpedia is: --01-04 The data in the wikipedia page seems a bit wierd: ``` | birth_date = 25 December 1642{{small|[[[Old Style and New Style...
The data seems a bit off here, in the live store, you show it as both a gMonthDay and a date: http://live.dbpedia.org/page/Jandek But in the [persondata dump](http://downloads.dbpedia.org/3.9/en/persondata_en.nq.bz2), it only has...
Not sure if this is actually an error. When i use an XML representation of the long abstracts i get parse errors when accessing some abstracts because they contain invalid...
In [Language.scala](https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/util/Language.scala) there is map for language codes which do not follow ISO-639-1. I think some mapping is not correct. For example, "lez" -> "ru", Lezgian language mapped to Russian....
dbpedia 2014 dataset short_abstracts_en file downloaded from http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/short_abstracts_en.nt.bz2 on 9/29/2014 wget http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/short_abstracts_en.nt.bz2 bunzip2 short_abstracts_en.nt.bz2 head -n 1263475 short_abstracts_en.nt | tail > parse_error.nt arq --strict --data parse_error.nt --query query.rq 08:53:18 ERROR...
1) Consider http://mappings.dbpedia.org/index.php/OntologyProperty:FirstAscent: it specifies `rdfs:domain Mountain, Volcano`. The author of that mapping probably thought this means that the property `firstAscent` should apply to `Mountain` or `Volcano`. But by RDFS...