wikiforia
wikiforia copied to clipboard
a one-page-per-line option
I don't know if you want this, but I found it useful for processing the pages with Apache Beam/Google Dataflow. I allows the user to presume that each row in the output file is a self-contained page, starting with a signature of attributes like:{{page:id}}
Changes:
- Added the one-lined option to App.java
- Added a "signature" to WikipediaPage.java
- Added the OneLineWikipediaPageWriter.java that writes each article to a single row