wikiextractor icon indicating copy to clipboard operation
wikiextractor copied to clipboard

for people with all kinds of weird bugs

Open xiaoouwang opened this issue 4 years ago • 7 comments

use pip install wikiextractor instead of using the script without installing

that solves a lot of problems for me

xiaoouwang avatar Oct 13 '20 16:10 xiaoouwang

Thanks @xiaoouwang

arrrrrmin avatar Oct 14 '20 09:10 arrrrrmin

Thanks @xiaoouwang - it seems like the code and the README are not that helpful because they're inconsistent

eg you can't just run the script as the README implies, unless you install it from PyPI and even if you install with the setup.py there are various issues

nmstoker avatar Oct 24 '20 14:10 nmstoker

I use pip install wikiextractor, when i try to ran the command wikiextractor wiki/enwiki-latest-pages-articles-multistream.xml , I encountered the problem like this:

Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor'

Any ideas?

tkunlin avatar Oct 28 '20 07:10 tkunlin

I use pip install wikiextractor, when i try to ran the command wikiextractor wiki/enwiki-latest-pages-articles-multistream.xml , I encountered the problem like this:

Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor'

Any ideas?

it's a module, run python -m wikiextractor.WikiExtractor file

xiaoouwang avatar Oct 28 '20 07:10 xiaoouwang

Thanks @xiaoouwang - it seems like the code and the README are not that helpful because they're inconsistent

eg you can't just run the script as the README implies, unless you install it from PyPI and even if you install with the setup.py there are various issues

exactly lol this package needs some minor fix. Glad that it helps!

xiaoouwang avatar Oct 28 '20 09:10 xiaoouwang

I use pip install wikiextractor, when i try to ran the command wikiextractor wiki/enwiki-latest-pages-articles-multistream.xml , I encountered the problem like this: Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor' Any ideas?

it's a module, run python -m wikiextractor.WikiExtractor file

It's work ! Thank you.

tkunlin avatar Oct 29 '20 03:10 tkunlin

I use pip install wikiextractor, when i try to ran the command wikiextractor wiki/enwiki-latest-pages-articles-multistream.xml , I encountered the problem like this: Traceback (most recent call last): File "/usr/local/bin/wikiextractor", line 11, in <module> load_entry_point('wikiextractor==3.0.0', 'console_scripts', 'wikiextractor')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 480, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2693, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2324, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2330, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'wikiextractor.Wikiextractor' Any ideas?

it's a module, run python -m wikiextractor.WikiExtractor file

For anyone trying to train the XLM model by following that guide and wikiextractor is filled with errors, install wikiextractor using pip, then use the above quoted solution to replace the following line in get-data-wiki.sh

python -m $TOOLS_PATH/wikiextractor/wikiextractor/WikiExtractor.py $WIKI_PATH/bz2/$WIKI_DUMP_NAME --processes 8 -q -o - \

with

python -m wikiextractor.WikiExtractor $WIKI_PATH/bz2/$WIKI_DUMP_NAME --processes 8 -q -o - \

This makes wikiextractor run without a hitch.

kdzhou avatar Nov 05 '20 19:11 kdzhou