Image-Adaptive-3DLUT icon indicating copy to clipboard operation
Image-Adaptive-3DLUT copied to clipboard

Tensor device problems in code

Open onpix opened this issue 3 years ago • 2 comments

In some situation, training code raises a device error:

cuda error: an illegal memory access was encountered

After debug I found that the main reason is Tensor used during the training is not on the same device. For example:

In Generator3DLUT_identity and Generator3DLUT_zero, self.LUT.device is cpu. In TrilinearInterpolationFunction, int_package and float_package are also on cpu. However, the input and the output of the network are cuda tensor, causing sometimes device error occurs when running the model.

To solve the problem, it's better to init all tensors as the same and dynamic type, instead of initializing the tensor on fixed devices.

onpix avatar Mar 17 '21 09:03 onpix

It's a good way to init tensor by:

tensor = torch.FloatTensor(...).type_as(input)

or

tensor = torch.FloatTensor(...).to(input.device)

which guarantees all tensors are on the same device.

onpix avatar Mar 17 '21 09:03 onpix

In some situation, training code raises a device error:

cuda error: an illegal memory access was encountered

After debug I found that the main reason is Tensor used during the training is not on the same device. For example:

In Generator3DLUT_identity and Generator3DLUT_zero, self.LUT.device is cpu. In TrilinearInterpolationFunction, int_package and float_package are also on cpu. However, the input and the output of the network are cuda tensor, causing sometimes device error occurs when running the model.

To solve the problem, it's better to init all tensors as the same and dynamic type, instead of initializing the tensor on fixed devices.

Could you explain more? I met this problem,but I don't know how to do this. Do you mean that we can put the LUT and TrilinearInterpolationFunction on the gpu?

zyhrainbow avatar Dec 13 '21 01:12 zyhrainbow