wikiextractor icon indicating copy to clipboard operation
wikiextractor copied to clipboard

ValueError: cannot find context for 'fork' & cannot pickle '_io.TextIOWrapper' object

Open Harry1035 opened this issue 1 year ago • 8 comments

In the windows environment, the following problems occur when extracting the WIKI corpus:

INFO: Starting page extraction from zhwiki-20240301-pages-articles-multistream.xml.bz2.
Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 643, in <module>
    main()
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 639, in main
    process_dump(input_file, args.templates, output_path, file_size,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 417, in process_dump
    Process = get_context("fork").Process
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 243, in get_context
    return super().get_context(method)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 193, in get_context
    raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'fork'

dit: wikiextractor\WikiExtractor.py line 417

Process = get_context("fork").Process -> Process = get_context("spawn").Process

A new problem arises:

INFO: Starting page extraction from zhwiki-20240301-pages-articles-multistream.xml.bz2.
Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\anaconda3\envs\myenv\Scripts\wikiextractor.exe\__main__.py", line 7, in <module>
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 639, in main
    process_dump(input_file, args.templates, output_path, file_size,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 425, in process_dump
    reduce.start()
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Harry1035 avatar Mar 20 '24 10:03 Harry1035

"ValueError: cannot find context for 'fork' " I got the same problem

sghyan16 avatar Mar 25 '24 07:03 sghyan16

Thanks for your solution about switching to "spawn" mode! I am having the same issue on windows. I tried several ways but none of them work. I guess the problem lies in reduce = Process(target=reduce_process, args=(output_queue, output)), where the file output is a non-pickable parameter. See here. You could try running this code in ubuntu or other linux system.

dengmengjie avatar Mar 25 '24 14:03 dengmengjie

The issue has been resolved in my own fork qfcy/wikiextractor by making slight modifications to the code, and I've submitted a pull request.
Additionally, the fixed code can be successfully run on Windows. Here is a screenshot with ConEmu:

Image

qfcy avatar Mar 01 '25 11:03 qfcy

The issue has been resolved in my own fork qfcy/wikiextractor by making slight modifications to the code, and I've submitted a pull request. Additionally, the fixed code can be successfully run on Windows. Here is a screenshot with ConEmu:

Image

I install wikiextractor by building code When i use the code you changed but I got the same problem Can you help me solve it ?Thanks Image

Happiness-in-Danger avatar Mar 21 '25 17:03 Happiness-in-Danger

@Happiness-in-Danger Oh, the line 416 in WikiExtractor.py should be changed from get_context("fork").Process to get_context("spawn").Process.
I've recommitted it onto my fork. Thanks.

qfcy avatar Mar 22 '25 18:03 qfcy

@Happiness-in-Danger Oh, the line 416 in WikiExtractor.py should be changed from get_context("fork").Process to get_context("spawn").Process. I've recommitted it onto my fork. Thanks.

I got it ,thanks. I found some format errors in the line 66 in WikiExtractor.py & in line 945 in extract.py and Class OutputSplitter() in WikiExtractor.py maybe need append encoding='utf-8' for function open()

Happiness-in-Danger avatar Mar 23 '25 05:03 Happiness-in-Danger

@Happiness-in-Danger Oh, the line 416 in WikiExtractor.py should be changed from get_context("fork").Process to get_context("spawn").Process. I've recommitted it onto my fork. Thanks.

I got it ,thanks. I found some format errors in the line 66 in WikiExtractor.py & in line 945 in extract.py and Class OutputSplitter() in WikiExtractor.py maybe need append encoding='utf-8' for function open()

OK, I've solved them and recommitted them onto my fork. Additionally, the error at line 66 in WikiExtractor.py is from the original WikiExtractor.py. Thanks for your correction!

qfcy avatar Mar 23 '25 15:03 qfcy

@Happiness-in-Danger Oh, the line 416 in WikiExtractor.py should be changed from get_context("fork").Process to get_context("spawn").Process. I've recommitted it onto my fork. Thanks.

I got it ,thanks. I found some format errors in the line 66 in WikiExtractor.py & in line 945 in extract.py and Class OutputSplitter() in WikiExtractor.py maybe need append encoding='utf-8' for function open()

OK, I've solved them and recommitted them onto my fork. Additionally, the error at line 66 in WikiExtractor.py is from the original WikiExtractor.py. Thanks for your correction!

I know it.I ran original code there is not this error. however terminal prompted error when I ran your fork. I rewrote this line,It's ok

Happiness-in-Danger avatar Mar 24 '25 03:03 Happiness-in-Danger