mmskeleton icon indicating copy to clipboard operation
mmskeleton copied to clipboard

CUDA RUNTIME ERROR when build_dataset_example

Open trungmanhhuynh opened this issue 4 years ago • 4 comments

Hi,

I tried to build the example dataset using this command:

 mmskl configs/utils/build_dataset_example.yaml --gpus 1

But got this cuda runtime error.

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCGeneral.cpp line=54 error=3 : initialization error Process Process-4: Traceback (most recent call last): File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/manhh/github/mmskeleton/mmskeleton/processor/skeleton_dataset.py", line 21, in worker detection_cfg, estimation_cfg, device=gpu) File "/home/manhh/github/mmskeleton/mmskeleton/apis/estimation.py", line 30, in init_pose_estimator detection_model = detection_model.cuda() File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 311, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 230, in _apply param_applied = fn(param) File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 311, in return self._apply(lambda t: t.cuda(device)) File "/home/manhh/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/cuda/init.py", line 179, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (3) : initialization error at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCGeneral.cpp:54

COuld you please help ? I guess it comes from the call Process

 for i in range(num_worker):
        p = Process(
            target=worker,
            args=(inputs, results, i % gpus, detection_cfg, estimation_cfg))
        procs.append(p)
        p.start()

But I am not sure,

The other commands work fine for me

Thank you,

trungmanhhuynh avatar Jun 02 '20 03:06 trungmanhhuynh

I faced the same issue. first I edited the mmskl.py file -

if __name__ == "__main__":
    torch.multiprocessing.set_start_method('spawn')
    main()

then ran the below command - mmskl configs/utils/build_dataset_example.yaml here first I tried --gpus 0 with above command but it didn't work for me.

kaiser-hamid-rabbi avatar Jun 07 '20 05:06 kaiser-hamid-rabbi

I encountered the same problem, is there a solution?

liqier avatar Jun 14 '20 10:06 liqier

I faced the same issue. first I edited the mmskl.py file -

if __name__ == "__main__":
    torch.multiprocessing.set_start_method('spawn')
    main()

then ran the below command - mmskl configs/utils/build_dataset_example.yaml here first I tried --gpus 0 with above command but it didn't work for me.

I have the same problems. You just need to use multiprocessing instead of torch.multiprocessing. Here is my solution edited in mmskl.py file:

import multiprocessing as mp
if __name__ == "__main__":
    mp.set_start_method('spawn')
    main()

It works for me!

Richard-Codes avatar Sep 09 '20 02:09 Richard-Codes

How did you solve this problem guys?

MaarufB avatar Mar 13 '21 06:03 MaarufB