wikiextractor
wikiextractor copied to clipboard
for people with all kinds of weird bugs
use pip install wikiextractor
instead of using the script without installing
that solves a lot of problems for me
Thanks @xiaoouwang
Thanks @xiaoouwang - it seems like the code and the README are not that helpful because they're inconsistent
eg you can't just run the script as the README implies, unless you install it from PyPI and even if you install with the setup.py there are various issues
I use pip install wikiextractor
, when i try to ran the command wikiextractor wiki/enwiki-latest-pages-articles-multistream.xml
, I encountered the problem like this:
Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor'
Any ideas?
I use
pip install wikiextractor
, when i try to ran the commandwikiextractor wiki/enwiki-latest-pages-articles-multistream.xml
, I encountered the problem like this:
Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor'
Any ideas?
it's a module, run python -m wikiextractor.WikiExtractor file
Thanks @xiaoouwang - it seems like the code and the README are not that helpful because they're inconsistent
eg you can't just run the script as the README implies, unless you install it from PyPI and even if you install with the setup.py there are various issues
exactly lol this package needs some minor fix. Glad that it helps!
I use
pip install wikiextractor
, when i try to ran the commandwikiextractor wiki/enwiki-latest-pages-articles-multistream.xml
, I encountered the problem like this:Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor'
Any ideas?it's a module, run python -m wikiextractor.WikiExtractor file
It's work ! Thank you.
I use
pip install wikiextractor
, when i try to ran the commandwikiextractor wiki/enwiki-latest-pages-articles-multistream.xml
, I encountered the problem like this:Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor'
Any ideas?it's a module, run python -m wikiextractor.WikiExtractor file
For anyone trying to train the XLM model by following that guide and wikiextractor is filled with errors, install wikiextractor using pip, then use the above quoted solution to replace the following line in get-data-wiki.sh
python -m $TOOLS_PATH/wikiextractor/wikiextractor/WikiExtractor.py $WIKI_PATH/bz2/$WIKI_DUMP_NAME --processes 8 -q -o - \
with
python -m wikiextractor.WikiExtractor $WIKI_PATH/bz2/$WIKI_DUMP_NAME --processes 8 -q -o - \
This makes wikiextractor run without a hitch.