cc_net icon indicating copy to clipboard operation
cc_net copied to clipboard

Failing to use mp execution

Open alexandremuzio opened this issue 4 years ago • 4 comments

I am trying to use the MPExecutor but I am getting the following error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/data1/alferre/cc_net/cc_net/execution.py", line 145, in global_fn
    return f(*args[1:])
  File "/data1/alferre/cc_net/cc_net/mine.py", line 347, in _mine_shard
    output=tmp_output if not conf.will_split else None,
  File "/data1/alferre/cc_net/cc_net/jsonql.py", line 435, in run_pipes
    initargs=(transform,),
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
    self._repopulate_pool()
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/multiprocessing/process.py", line 110, in start
    'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data1/alferre/cc_net/cc_net/__main__.py", line 24, in <module>
    main()
  File "/data1/alferre/cc_net/cc_net/__main__.py", line 20, in main
    func_argparse.parse_and_call(parser)
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/site-packages/func_argparse/__init__.py", line 72, in parse_and_call
    return command(**parsed_args)
  File "/data1/alferre/cc_net/cc_net/mine.py", line 509, in main
    regroup(conf)
  File "/data1/alferre/cc_net/cc_net/mine.py", line 364, in regroup
    mine(conf)
  File "/data1/alferre/cc_net/cc_net/mine.py", line 271, in mine
    ex(_mine_shard, repeat(conf), hashes_files, *_transpose(missing_outputs))
  File "/data1/alferre/cc_net/cc_net/execution.py", line 174, in __call__
    global_fn, zip(itertools.repeat(f_name), *args)
  File "/home/alferre/anaconda3/envs/mtdev/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
AssertionError: daemonic processes are not allowed to have children

I am running the following command

python -m cc_net mine --config /home/alferre/data/cc_net/config/config_alex.json

And this is my config file:

{
    "output_dir": "/home/alferre/data/cc_net/data_alex",
    "dump": "2019-09",
    "num_shards": 1,
    "num_segments_per_shard": 1,
    "hash_in_mem": 2,
    "mine_num_processes": 4,
    "lang_whitelist": [
        "pt"
    ],
    "execution": "mp",
    "target_size": "32M",
    "cleanup_after_regroup": false
}

alexandremuzio avatar Feb 06 '20 19:02 alexandremuzio