headnerf icon indicating copy to clipboard operation
headnerf copied to clipboard

Training

Open andreluizbvs opened this issue 3 years ago • 2 comments

Hello,

I'm trying to run the training script (train.py) on one of the datasets (specifically on comprehensive_cars), and I'm having some problems.

Just for the record, I'm doing that as a first step, since my ultimate goal is to fine-tune a pre-trained model on my own custom dataset.

After following all the installation steps and dataset downloads shown in the README.md file, I run $ python train.py configs/256res/cars_256.yaml and mainly two error/warning messages are outputted multiple times:

Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Warning: Error occurred when loading file data/comprehensive_cars/images/*.jpg

  • Here in this warning, for each time this message appears, the * stands for a different image filename from the dataset. Highly probable that none of the images could be loaded.

So, after following the CUDA-related message and using "the 'spawn' start method" by adding torch.multiprocessing.set_start_method('spawn') at the beginning of train.py, an exception is raised from this line:

  File "media/<user>/HD/HF_train/headnerf/train.py", line 14, in <module>
    torch.multiprocessing.set_start_method('spawn')
  File "/home/<user>/anaconda3/envs/test2/lib/python3.8/multiprocessing/context.py", line 243, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Can anyone tell what this means? Or even if I'm on the right path to successfully run train.py?

Kind regards, André

andreluizbvs avatar Sep 19 '22 13:09 andreluizbvs

Hello,

I'm trying to run the training script (train.py) on one of the datasets (specifically on comprehensive_cars), and I'm having some problems.

Just for the record, I'm doing that as a first step, since my ultimate goal is to fine-tune a pre-trained model on my own custom dataset.

After following all the installation steps and dataset downloads shown in the README.md file, I run $ python train.py configs/256res/cars_256.yaml and mainly two error/warning messages are outputted multiple times:

Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Warning: Error occurred when loading file data/comprehensive_cars/images/*.jpg

  • Here in this warning, for each time this message appears, the * stands for a different image filename from the dataset. Highly probable that none of the images could be loaded.

So, after following the CUDA-related message and using "the 'spawn' start method" by adding torch.multiprocessing.set_start_method('spawn') at the beginning of train.py, an exception is raised from this line:

  File "media/<user>/HD/HF_train/headnerf/train.py", line 14, in <module>
    torch.multiprocessing.set_start_method('spawn')
  File "/home/<user>/anaconda3/envs/test2/lib/python3.8/multiprocessing/context.py", line 243, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Can anyone tell what this means? Or even if I'm on the right path to successfully run train.py?

Kind regards, André

I'm having the same problem! No clue yet on how to solve it, I would appreciate any help!

GustavoCamargoRL avatar Sep 21 '22 12:09 GustavoCamargoRL

Hello, I'm trying to run the training script (train.py) on one of the datasets (specifically on comprehensive_cars), and I'm having some problems. Just for the record, I'm doing that as a first step, since my ultimate goal is to fine-tune a pre-trained model on my own custom dataset. After following all the installation steps and dataset downloads shown in the README.md file, I run $ python train.py configs/256res/cars_256.yaml and mainly two error/warning messages are outputted multiple times: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method Warning: Error occurred when loading file data/comprehensive_cars/images/*.jpg

  • Here in this warning, for each time this message appears, the * stands for a different image filename from the dataset. Highly probable that none of the images could be loaded.

So, after following the CUDA-related message and using "the 'spawn' start method" by adding torch.multiprocessing.set_start_method('spawn') at the beginning of train.py, an exception is raised from this line:

  File "media/<user>/HD/HF_train/headnerf/train.py", line 14, in <module>
    torch.multiprocessing.set_start_method('spawn')
  File "/home/<user>/anaconda3/envs/test2/lib/python3.8/multiprocessing/context.py", line 243, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Can anyone tell what this means? Or even if I'm on the right path to successfully run train.py? Kind regards, André

I'm having the same problem! No clue yet on how to solve it, I would appreciate any help!

Hi. The same problems. Have you solved the problem? Thank you.

ohjarwa avatar Nov 12 '22 08:11 ohjarwa