dolma icon indicating copy to clipboard operation
dolma copied to clipboard

make_wikipedia.py fails on linux

Open peterbjorgensen opened this issue 8 months ago • 9 comments

Traceback (most recent call last):
  File "/home/peter/kode/dolma/dolma_env/lib/python3.11/site-packages/dolma/core/parallel.py", line 283, in _multiprocessing_run_all
    multiprocessing.set_start_method("spawn")
  File "/usr/lib/python3.11/multiprocessing/context.py", line 247, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/peter/kode/dolma/scripts/make_wikipedia.py", line 289, in <module>
    main()
  File "/home/peter/kode/dolma/scripts/make_wikipedia.py", line 285, in main
    processor(date=args.date, lang=args.lang)
  File "/home/peter/kode/dolma/dolma_env/lib/python3.11/site-packages/dolma/core/parallel.py", line 390, in __call__
    fn(
  File "/home/peter/kode/dolma/dolma_env/lib/python3.11/site-packages/dolma/core/parallel.py", line 285, in _multiprocessing_run_all
    assert multiprocessing.get_start_method() == "spawn", "Multiprocessing start method must be spawn"
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Multiprocessing start method must be spawn

The bug can be fixed by setting multiprocessing.set_start_method("spawn") in the __main__ environment.

Perhaps the dolma core/parallel.py should use multiprocessing.get_context("spawn") to avoid this.

peterbjorgensen avatar Oct 17 '23 13:10 peterbjorgensen