InternEvo
InternEvo copied to clipboard
[Bug] 昇腾910微调internLM报错
Describe the bug
Traceback (most recent call last): File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/queues.py", line 368, in put self._writer.send_bytes(obj) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes self._send(header + buf) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/queues.py", line 368, in put self._writer.send_bytes(obj) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) BrokenPipeError: [Errno 32] Broken pipe File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes self._send(header + buf)
During handling of the above exception, another exception occurred:
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) Traceback (most recent call last): BrokenPipeError: [Errno 32] Broken pipe
Environment
python==3.8 torch==2.0.1
Other information
No response
hello @rourouZ ,您好,看起来torchnpu输出的报错堆栈包含的有效信息不多,我们这边适配华为NPU使用的环境是:
torch: 2.1.0+cpu
torch_npu: 2.1.0.post3+git7c4136d
cann: 8.0.RC1.alpha003
您可以试试用这个环境跑下,我这边测试应该是ok的,如果您有任何问题internlm交流群@我也可以
可以麻烦提供下运行成功的npu镜像吗?多谢!
可以麻烦提供下运行成功的npu镜像吗?多谢!
可以试下这个 docker pull internlm/opencompass:opencompass-20240607