wikiforia
wikiforia copied to clipboard
A Utility Library for Wikipedia dumps
so i ran the main methode using the following command: java -jar wikiforia-1.2.1.jar -pages /home/blo/wikiforia-master/enwiki-20170420-pages-articles1.xml-p10p30302.bz2 -output /home/blo/wikiforia-master/wiki-extract/ -outputformat plain-text but i ran into following error: Exception in thread "main" java.io.IOError:...
I don't know if you want this, but I found it useful for processing the pages with Apache Beam/Google Dataflow. I allows the user to presume that each row in...
Fixed a bug in PlainTextWikipediaPageWriter. If you specified an output file that does not exist, it would crash. Since the point of an output file is to create something new,...
I tried `java -jar wikiforia-1.2.1.jar -pages /home/sudeshna/wikiforia-master/enwiki-20161201-pages-articles-multistream.xml.bz2 -output /home/sudeshna/ -outputformat plain-text` I am getting this exception ``` Exception in thread "main" java.io.IOError: java.io.IOException: unexpected end of stream at se.lth.cs.nlp.mediawiki.parser.MultistreamBzip2XmlDumpParser$PageReader.(MultistreamBzip2XmlDumpParser.java:213) at...
hey Marcus I tried ``` git clone mvn compile mvn package ``` so far so good (ok I had a little trouble figuring out that the easiest way to respect...
Hello. Will you deploy your cool project to the global maven repository? I tried to search the next definition from pom.xml but there are no results: `se.lth.cs.nlp` `wikiforia` `1.2.1`
Hi, I'm using wikiforia (version 1.1.1) to parse the english wikipedia dump ("version" 20150602) and encounter this error and it makes wikiforia stop. Below is the log. How can I...
java -jar target/wikiforia-1.2.1.jar --pages ../frwiki-20150602-pages-articles-multistream.xml.bz2 -lang fr -o xml interrupt after a couple of minutes since the issue is in the first pages Example : Amsterdam, id = 245 Le...
Several cases are mishandled in frwiki, not all fortunately. Only when apostrophes are mixed with markup symbols. 1. L{{'}}'''Andalousie''' becomes LAndalousie LAndalousie (Andalucía en espagnol, du bas latin ”Vandalucia” in...