land_cover_classification_unet icon indicating copy to clipboard operation
land_cover_classification_unet copied to clipboard

RuntimeError in Half precision: type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor

Open yulijun1220 opened this issue 3 years ago • 5 comments

Hi, I'm running the code with the sample training data with windows 10 operate system. If I run it in float precision, there would be an error "CUDA out of Memory". If I run it in the recommend Half Precision way on the source code : net.half() #convert the model to half precision ################# #run prediction #calculate loss ################## net.float() #convert the model back to full precision to continue #with backward propagation " there was a Runtime error as follows: Traceback (most recent call last): File "train.py", line 232, in val_percent=args.val / 100) File "train.py", line 123, in train_net loss.backward() File "C:\ProgramData\Anaconda3\envs\pytorch-py3.7\lib\site-packages\torch_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\ProgramData\Anaconda3\envs\pytorch-py3.7\lib\site-packages\torch\autograd_init_.py", line 156, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: Expected tensor for argument #1 'grad_output' to have the same type as tensor for argument #2 'weight'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_input)

Do you know the reason for this . Thanks~

PS-I installed Pytorch on my laptop by: conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch. The original version of CUDA driver on my laptop is NVIDIA CUDA 11.4.94.

yulijun1220 avatar Nov 24 '21 10:11 yulijun1220

Did you convert the model to full precision before running backpropagation?

On Wed, Nov 24, 2021 at 11:16 AM yulijun1220 @.***> wrote:

Hi, I'm running the code with the sample training data with windows 10 operate system. If I run it in float precision, there would be an error "CUDA out of Memory". If I run it in the recommend Half Precision way on the source code : net.half() #convert the model to half precision ################# #run prediction #calculate loss ################## net.float() #convert the model back to full precision to continue #with backward propagation " there was a Runtime error as follows: Traceback (most recent call last): File "train.py", line 232, in val_percent=args.val / 100) File "train.py", line 123, in train_net loss.backward() File "C:\ProgramData\Anaconda3\envs\pytorch-py3.7\lib\site-packages\torch_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\ProgramData\Anaconda3\envs\pytorch-py3.7\lib\site-packages\torch\autograd_ init_.py", line 156, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: Expected tensor for argument #1 https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/1 'grad_output' to have the same type as tensor for argument #2 https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/2 'weight'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_input)

Do you know the reason for this . Thanks~

PS-I installed Pytorch on my laptop by: conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch. The original version of CUDA driver on my laptop is NVIDIA CUDA 11.4.94.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS6XO6DZVZXENSNSUO2LMODUNS3RRANCNFSM5IVVJCFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Srimannarayana Baratam

srimannarayanabaratam avatar Nov 24 '21 13:11 srimannarayanabaratam

I think there is already set for the convert. I downloaded the sample dataset to my local dir instead of google Drive. I just begin to test the code. So, I only change the directory of the training dataset to the local dir, and not change the further information of the source code

            # convert the prediction to float32 for avoiding nan in loss calculation
            masks_pred = masks_pred.type(torch.float32)

yulijun1220 avatar Nov 24 '21 15:11 yulijun1220

It's been a while since I last ran the code. But I am confident that the error can be backtraced using any IDE. It is a case of precision mismatch for the provided input and the expected input.

On Wed, Nov 24, 2021 at 4:15 PM yulijun1220 @.***> wrote:

I think there is already set for the convert. I downloaded the sample dataset to my local dir instead of google Drive. I just begin to test the code. So, I only change the directory of the training dataset to the local dir, and not change the further information of the source code

        # convert the prediction to float32 for avoiding nan in loss calculation
        masks_pred = masks_pred.type(torch.float32)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TarunKumar1995-glitch/land_cover_classification_unet/issues/4#issuecomment-977972501, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS6XO6D52LDQ5P3P5CAOOZDUNT6SVANCNFSM5IVVJCFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Srimannarayana Baratam

srimannarayanabaratam avatar Nov 24 '21 17:11 srimannarayanabaratam

hello, I meet the same question and have you solved it?

Yeah21 avatar Jun 01 '23 09:06 Yeah21

@yulijun1220 Have you figured it out?

rohit7044 avatar Sep 27 '23 10:09 rohit7044