Effort

1-2 days

Skills

basic maven, executing README file

Description

The DBpedia extraction framework can download a set of Wikipedia XML dumps and extract facts. There is a configuration file where you specify the language(s) you want and just run it. Setup your download & extract configuration files and run a simple dump-based extraction.

Impact

Get to know the was the extraction framework works.

Jan 23 '18 13:01 mgns

Hi. This is in reference to issue #24 . I downloaded the project and ran a dump-based extraction. Everything went well, just faced a java issue(Had to make sure to use java1.8, used jenv for this.) before the extraction. However, I had to stop the ../run download download.10000.properties command at date page 'https://dumps.wikimedia.org/wikidatawiki/20190101/' has all files [pages-articles-multistream.xml.bz2] downloading 'https://dumps.wikimedia.org/wikidatawiki/20190101/wikidatawiki-20190101-pages-articles-multistream.xml.bz2' to '/Users/anubhavujjawal/Desktop/data/extraction-data/2018-10/wikidatawiki/20190101/wikidatawiki-20190101-pages-articles-multistream.xml.bz2' read 28.0153 MB of 58.74201 GB in 01:52 min since I didn't have the available bandwidth and space(I use a macbook air 128 GB model) to complete this. After it, I ran ../run extraction extraction.default.properties ran well. Have I messed anything up?

Jan 15 '19 06:01 AnubhavUjjawal

Hi everyone and @mgns . I tried the instructions of the dump-based-extraction here on both of the download.10000.properties and download.minimal.properties download config file and got an error "Caused by: java.lang.IllegalArgumentException: Base directory does not exist yet: \data\extraction-data\2018-10" for both.

I tried to create directories from the root with /data/extraction-data/2018-10 but still got the error.

Is there any solution to this? Thank you very much.

Mar 28 '19 01:03 joshuabezaleel

GSoC
GSoC copied to clipboard

Run Extraction Framework

Effort

Skills

Description

Impact

GSoC GSoC copied to clipboard

Run Extraction Framework

Effort

Skills

Description

Impact

GSoC
GSoC copied to clipboard