wikidump This Library

This Library

Open MrWolvwxyz opened this issue 11 years ago • 1 comments

Hey this code looks perfect for a research project I'm working on. I downloaded the code through canopy and now I'm just trying to figure out how this code works. Do you have any documentation or file to start reading to understand better?

Nov 06 '13 05:11 MrWolvwxyz

I don't have any plans at the moment to develop this project in the immediate future. That said, it is in a usable state, and I've used it myself fairly recently. I'm not familiar with canopy, but if you install it like a normal Python package it will install a command-line tool, wikidump. wikidump -h provides some details on how to use it. When run, wikidump will generate a config file wikidump.cfg in the directory it was run it. This config file contains two paths you will need to amend, 'scratch', where the indexes can be stored, and 'xml_dumps', a path to a directory containing the downloaded xml dumps from Wikipedia. I've personally been using wp-download to download the dumps, so the path that wp-download saves them to is the path you want to set xml_dumps to. After downloading the relevant dumps, do wikidump index, and thereafter you can use wikidump dataset to pull out a dataset. Each of the commands should have a bit of help text, for example wikidump dataset -h. Let me know if you need help with figuring out how to do anything specifically, and I'll see if it can be done under the current implementation.

Nov 07 '13 01:11 saffsd

wikidump wikidump copied to clipboard

This Library

wikidump
wikidump copied to clipboard