wikiforia icon indicating copy to clipboard operation
wikiforia copied to clipboard

a one-page-per-line option

Open aaaton opened this issue 8 years ago • 0 comments

I don't know if you want this, but I found it useful for processing the pages with Apache Beam/Google Dataflow. I allows the user to presume that each row in the output file is a self-contained page, starting with a signature of attributes like:{{page:id}}

Changes:

  • Added the one-lined option to App.java
  • Added a "signature" to WikipediaPage.java
  • Added the OneLineWikipediaPageWriter.java that writes each article to a single row

aaaton avatar Feb 24 '17 11:02 aaaton