land_cover_classification_unet
land_cover_classification_unet copied to clipboard
RuntimeError in Half precision: type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor
Hi, I'm running the code with the sample training data with windows 10 operate system. If I run it in float precision, there would be an error "CUDA out of Memory".
If I run it in the recommend Half Precision way on the source code :
net.half() #convert the model to half precision
#################
#run prediction
#calculate loss
##################
net.float() #convert the model back to full precision to continue
#with backward propagation
"
there was a Runtime error as follows:
Traceback (most recent call last):
File "train.py", line 232, in
Do you know the reason for this . Thanks~
PS-I installed Pytorch on my laptop by: conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch. The original version of CUDA driver on my laptop is NVIDIA CUDA 11.4.94.
Did you convert the model to full precision before running backpropagation?
On Wed, Nov 24, 2021 at 11:16 AM yulijun1220 @.***> wrote:
Hi, I'm running the code with the sample training data with windows 10 operate system. If I run it in float precision, there would be an error "CUDA out of Memory". If I run it in the recommend Half Precision way on the source code : net.half() #convert the model to half precision ################# #run prediction #calculate loss ################## net.float() #convert the model back to full precision to continue #with backward propagation " there was a Runtime error as follows: Traceback (most recent call last): File "train.py", line 232, in val_percent=args.val / 100) File "train.py", line 123, in train_net loss.backward() File "C:\ProgramData\Anaconda3\envs\pytorch-py3.7\lib\site-packages\torch_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\ProgramData\Anaconda3\envs\pytorch-py3.7\lib\site-packages\torch\autograd_ init_.py", line 156, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: Expected tensor for argument #1 https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/1 'grad_output' to have the same type as tensor for argument #2 https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/2 'weight'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_input)
Do you know the reason for this . Thanks~
PS-I installed Pytorch on my laptop by: conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch. The original version of CUDA driver on my laptop is NVIDIA CUDA 11.4.94.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS6XO6DZVZXENSNSUO2LMODUNS3RRANCNFSM5IVVJCFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Srimannarayana Baratam
I think there is already set for the convert. I downloaded the sample dataset to my local dir instead of google Drive. I just begin to test the code. So, I only change the directory of the training dataset to the local dir, and not change the further information of the source code
# convert the prediction to float32 for avoiding nan in loss calculation
masks_pred = masks_pred.type(torch.float32)
It's been a while since I last ran the code. But I am confident that the error can be backtraced using any IDE. It is a case of precision mismatch for the provided input and the expected input.
On Wed, Nov 24, 2021 at 4:15 PM yulijun1220 @.***> wrote:
I think there is already set for the convert. I downloaded the sample dataset to my local dir instead of google Drive. I just begin to test the code. So, I only change the directory of the training dataset to the local dir, and not change the further information of the source code
# convert the prediction to float32 for avoiding nan in loss calculation masks_pred = masks_pred.type(torch.float32)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/4#issuecomment-977972501, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS6XO6D52LDQ5P3P5CAOOZDUNT6SVANCNFSM5IVVJCFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Srimannarayana Baratam
hello, I meet the same question and have you solved it?
@yulijun1220 Have you figured it out?