extraction-framework icon indicating copy to clipboard operation
extraction-framework copied to clipboard

The software used to extract structured data from Wikipedia

Results 150 extraction-framework issues
Sort by recently updated
recently updated
newest added

see here for details: http://sourceforge.net/mailarchive/forum.php?thread_name=17558.147.91.1.44.1385999403.squirrel%40mail.imp.bg.ac.rs&forum_name=dbpedia-developers

enhancement
GSoC Warmup task
type: data
status: triage-discussion-needed

Hi, I use the dbpedia extraction framework to extract link and category information. I have this: ``` val source = XMLSource.fromFile(new File("enwiki-latest-pages-articles.xml"), Language.English) source.toIterable .zipWithIndex .map { page: WikiPage =>...

question
type: data
status: cannot reproduce
status: triage-discussion-needed

The Live module is using mainly Exception.getMessage to log exceptions. As per the Javadoc the message can be null leading to useless log lines. Would be better to use Exception.toString...

type: software-bug
enhancement
status: triage-discussion-needed

See her entry here: http://en.wikipedia.org/wiki/Theodora_(wife_of_Justinian_I) It reads both the places: ``` . . ``` But the dates, which are in the info box, are not read. I would guess the...

type: data
status: fix-required
status: minidump-test-required

See: http://live.dbpedia.org/page/Isaac_Newton The birth date in dbpedia is: --01-04 The data in the wikipedia page seems a bit wierd: ``` | birth_date = 25 December 1642{{small|[[[Old Style and New Style...

type: data
status: triage-discussion-needed

The data seems a bit off here, in the live store, you show it as both a gMonthDay and a date: http://live.dbpedia.org/page/Jandek But in the [persondata dump](http://downloads.dbpedia.org/3.9/en/persondata_en.nq.bz2), it only has...

type: data
status: cannot reproduce
status: triage-discussion-needed

Not sure if this is actually an error. When i use an XML representation of the long abstracts i get parse errors when accessing some abstracts because they contain invalid...

GSoC Warmup task
type: data
status: fix-provided
status: minidump-test-required

In [Language.scala](https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/util/Language.scala) there is map for language codes which do not follow ISO-639-1. I think some mapping is not correct. For example, "lez" -> "ru", Lezgian language mapped to Russian....

type: data
status: test-method-required
status: triage-discussion-needed

dbpedia 2014 dataset short_abstracts_en file downloaded from http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/short_abstracts_en.nt.bz2 on 9/29/2014 wget http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/short_abstracts_en.nt.bz2 bunzip2 short_abstracts_en.nt.bz2 head -n 1263475 short_abstracts_en.nt | tail > parse_error.nt arq --strict --data parse_error.nt --query query.rq 08:53:18 ERROR...

type: data
status: fix-provided
status: minidump-test-required

1) Consider http://mappings.dbpedia.org/index.php/OntologyProperty:FirstAscent: it specifies `rdfs:domain Mountain, Volcano`. The author of that mapping probably thought this means that the property `firstAscent` should apply to `Mountain` or `Volcano`. But by RDFS...

type: data
status: triage-discussion-needed