ValueError: cannot find context for 'fork' & cannot pickle '_io.TextIOWrapper' object
In the windows environment, the following problems occur when extracting the WIKI corpus:
INFO: Starting page extraction from zhwiki-20240301-pages-articles-multistream.xml.bz2.
Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 643, in <module>
main()
File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 639, in main
process_dump(input_file, args.templates, output_path, file_size,
File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 417, in process_dump
Process = get_context("fork").Process
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 243, in get_context
return super().get_context(method)
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 193, in get_context
raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'fork'
dit: wikiextractor\WikiExtractor.py line 417
Process = get_context("fork").Process -> Process = get_context("spawn").Process
A new problem arises:
INFO: Starting page extraction from zhwiki-20240301-pages-articles-multistream.xml.bz2.
Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\ProgramData\anaconda3\envs\myenv\Scripts\wikiextractor.exe\__main__.py", line 7, in <module>
File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 639, in main
process_dump(input_file, args.templates, output_path, file_size,
File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 425, in process_dump
reduce.start()
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
reduction.dump(process_obj, to_child)
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
"ValueError: cannot find context for 'fork' " I got the same problem
Thanks for your solution about switching to "spawn" mode! I am having the same issue on windows. I tried several ways but none of them work. I guess the problem lies in reduce = Process(target=reduce_process, args=(output_queue, output)), where the file output is a non-pickable parameter.
See here.
You could try running this code in ubuntu or other linux system.
The issue has been resolved in my own fork qfcy/wikiextractor by making slight modifications to the code, and I've submitted a pull request.
Additionally, the fixed code can be successfully run on Windows. Here is a screenshot with ConEmu:
The issue has been resolved in my own fork qfcy/wikiextractor by making slight modifications to the code, and I've submitted a pull request. Additionally, the fixed code can be successfully run on Windows. Here is a screenshot with ConEmu:
I install wikiextractor by building code
When i use the code you changed but I got the same problem
Can you help me solve it ?Thanks
@Happiness-in-Danger Oh, the line 416 in WikiExtractor.py should be changed from get_context("fork").Process to get_context("spawn").Process.
I've recommitted it onto my fork. Thanks.
@Happiness-in-Danger Oh, the line 416 in
WikiExtractor.pyshould be changed fromget_context("fork").Processtoget_context("spawn").Process. I've recommitted it onto my fork. Thanks.
I got it ,thanks. I found some format errors in the line 66 in WikiExtractor.py & in line 945 in extract.py and Class OutputSplitter() in WikiExtractor.py maybe need append encoding='utf-8' for function open()
@Happiness-in-Danger Oh, the line 416 in
WikiExtractor.pyshould be changed fromget_context("fork").Processtoget_context("spawn").Process. I've recommitted it onto my fork. Thanks.I got it ,thanks. I found some format errors in the line 66 in WikiExtractor.py & in line 945 in extract.py and Class OutputSplitter() in WikiExtractor.py maybe need append encoding='utf-8' for function open()
OK, I've solved them and recommitted them onto my fork. Additionally, the error at line 66 in WikiExtractor.py is from the original WikiExtractor.py. Thanks for your correction!
@Happiness-in-Danger Oh, the line 416 in
WikiExtractor.pyshould be changed fromget_context("fork").Processtoget_context("spawn").Process. I've recommitted it onto my fork. Thanks.I got it ,thanks. I found some format errors in the line 66 in WikiExtractor.py & in line 945 in extract.py and Class OutputSplitter() in WikiExtractor.py maybe need append encoding='utf-8' for function open()
OK, I've solved them and recommitted them onto my fork. Additionally, the error at line 66 in WikiExtractor.py is from the original WikiExtractor.py. Thanks for your correction!
I know it.I ran original code there is not this error. however terminal prompted error when I ran your fork. I rewrote this line,It's ok