wikiextractor icon indicating copy to clipboard operation
wikiextractor copied to clipboard

Allow wikiextractor to leave out certain page id's

Open timbicker opened this issue 6 years ago • 0 comments

I recently had the problem, that after several hours of processing, wikiextractor threw an error. I changed a few lines to filter out the already processed page id's in order to continue where I left off. Currently, there is at least one other issues, that would benefit from such a solution (and is very similiar to mine): 136 In order to apply the solution, I would do the following: Add another input parameter: --page_ids which defaults to [0,infinity] and can be adjusted in the following format [start_id,end_id].

timbicker avatar Aug 05 '18 16:08 timbicker