SelectionGAN
SelectionGAN copied to clipboard
CNDNN_ERROR ?
Hello Sir,
Using my-datasets, I tried to train your code. But I met CUDNN-ERROR.
...
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [374,0,0], thread: [62,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [374,0,0], thread: [63,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/train.py", line 40, in <module>
trainer.run_generator_one_step(data_i)
File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/trainers/pix2pix_trainer.py", line 35, in run_generator_one_step
g_losses, generated = self.pix2pix_model(data, mode='generator')
File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/pix2pix_model.py", line 46, in forward
input_semantics, real_image)
File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/pix2pix_model.py", line 136, in compute_generator_loss
input_semantics, real_image, compute_kld_loss=self.opt.use_vae)
File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/pix2pix_model.py", line 198, in generate_fake
fake_image = self.netG(input_semantics, z=z)
File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/networks/generator.py", line 90, in forward
x = self.fc(x)
File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f44b3256a22 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10aa3 (0x7f44b34b7aa3 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7f44b34b9147 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f44b32405a4 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0xa2f382 (0x7f4558065382 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xa2f421 (0x7f4558065421 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #21: __libc_start_main + 0xe7 (0x7f455add0b97 in /lib/x86_64-linux-gnu/libc.so.6)
How to solve it??
Thanks, Edward Cho.
Please provide more information on how you train.
Hello sir. I tried to train image-to-image translation using your code.
My dataset is as follows:
- Blur image vs clear image : paired image set, gray scale
- Because I could not prepare labeled image, i think that blur image same to labeled image.
If i couldn't prepare "semantic labeled image", i can't use your code??
Did you successfully run my code with my dataset?
You can run the code without using "semantic labeled image".
How many channel dimensions is the blurred image? It should be 3, if not, you need to change the code.
I am having the same issue! When I use the ADE dataset images it trains with no issues but when I use my own, with the same bit depth, it gives me this error!
I think figured it out. It's basically an overload error I think. Decreasing the amount of images may help with that error. I don't know why a lower batch size still produces that error though, I now have to reduce my image count to just 25. This is code breaking. @Ha0Tang, please fix this or explain what I can be doing wrong.