DeepSeek-Coder
DeepSeek-Coder copied to clipboard
ERROR: ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers' (most likely due to a circular import)
I just downloaded the repo and ran the Evaluation/Humaneval eval.sh in the bash command. ( with deepseek-coder-1.3b-base)
But I have the following errors:
Reading samples...
100%|████████████████████████████████████████████████████████████████████████████████████████| 164/164 [00:00<00:00, 13548.93it/s]
Running test suites...
0%| | 0/164 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/tiger/deepcode/Evaluation/HumanEval/eval_pal.py", line 42, in
evaluator.eval_model(model, accelerator)
File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/tiger/deepcode/Evaluation/HumanEval/humaneval.py", line 125, in eval_model
self._calculate_final_score(accelerator)
File "/opt/tiger/deepcode/Evaluation/HumanEval/humaneval.py", line 159, in _calculate_final_score
res = evaluate_functional_correctness(input_file=logfilepath, problem_file=os.path.join(self.data_root, f"humaneval-{self.language}.jsonl"), tmp_dir=self.log_dir, timeout=timeout, language=runlang)
File "/opt/tiger/deepcode/Evaluation/HumanEval/human_eval/evaluation.py", line 277, in evaluate_functional_correctness
result = future.result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 433, in result
return self.__get_result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/tiger/deepcode/Evaluation/HumanEval/human_eval/execution.py", line 549, in check_correctness
manager = Manager()
File "/usr/lib/python3.9/multiprocessing/context.py", line 55, in Manager
from .managers import SyncManager
ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers' (most likely due to a circular import) (/usr/lib/python3.9/multiprocessing/managers.py)
2024-02-01 13:34:30.269 n188-182-020:19533:23012 [1] NCCL INFO [Service thread] Connection closed by localRank 0
2024-02-01 13:34:30.269 n188-182-020:19532:23014 [0] NCCL INFO [Service thread] Connection closed by localRank 0
2024-02-01 13:34:30.269 n188-182-020:19534:23013 [2] NCCL INFO [Service thread] Connection closed by localRank 0
2024-02-01 13:34:34.341 n188-182-020:19532:19532 [0] NCCL INFO comm 0xb9bbef60 rank 0 nranks 3 cudaDev 0 busId 1a000 - Abort COMPLETE
[2024-02-01 13:34:37,641] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19533 closing signal SIGTERM
[2024-02-01 13:34:37,641] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19534 closing signal SIGTERM
[2024-02-01 13:34:38,307] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 19532) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1033, in
main()
File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1029, in main
launch_command(args)
File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1014, in launch_command
multi_gpu_launcher(args)
File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 672, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
eval_pal.py FAILED
Failures: <NO_OTHER_FAILURES>
Root Cause (first observed failure): [0]: time : 2024-02-01_13:34:37 host : n188-182-020.byted.org rank : 0 (local_rank: 0) exitcode : 1 (pid: 19532) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
It is the multiprocessing import problem since I HAVEN'T changed a little code. I WONDER if you can solve it.
@kokolerk hello~ have you solved this problem?
这个应该是python版本问题,把环境的python版本改成3.8估计可以好。
It can't run under the ThreadPoolExecutor context. Simply remove it to run in a for loop would solve it, although it will run a bit slower.