wp2txt icon indicating copy to clipboard operation
wp2txt copied to clipboard

A command-line toolkit to extract text content and category data from Wikipedia dump files

Results 5 wp2txt issues
Sort by recently updated
recently updated
newest added

Version: 0.9.1 Hi, I found some extracted titles wrong, and which seems to occur occasionally. To reproduce the bug: Run `wp2txt` twice like below. (The dump file I used is...

Getting below error. Not sure whats the issue. c:\Users\gopal\Downloads>wp2txt -i enwiki-20190820.bz2 -o wikitxt [DEPRECATION] This gem has been renamed to optimist and will no longer be supported. Please switch to...

can we extract only one page or some a few specified pages instead of processing millions of pages?

I am using a google cloud machine so prefer not to use up too much disk space with docker. I am running CentOS 8.

Hi, Im getting Segmentation fault when extracting enwiki. CPU: ```processor : 31 vendor_id : AuthenticAMD cpu family : 25 model : 33 model name : AMD Ryzen 9 5950X 16-Core...