apex icon indicating copy to clipboard operation
apex copied to clipboard

Expected tensor for argument #1 'input' to have the same type as tensor for argument #2 'rois'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor

Open sarmientoj24 opened this issue 4 years ago • 7 comments

Getting this while training a Faster RCNN On training process

for epoch in range(num_epochs):
  model.train()
  i = 0    
  for imgs, annotations in data_loader:
    i += 1
    total_processed += 1
    imgs = list(img.to(device) for img in imgs)
    annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
    loss_dict = model(imgs, annotations)
    losses = sum(loss for loss in loss_dict.values())

    optimizer.zero_grad()
    losses.backward()
    with amp.scale_loss(losses, optimizer) as scaled_loss:
      scaled_loss.backward()
    optimizer.step()

I get RUNTIME Error

  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-65-8fac2bdda8e5> in <module>()
     15     # imgs = torch.as_tensor(imgs, dtype=torch.float32)
     16     annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
---> 17     loss_dict = model(new_imgs, annotations)
     18     losses = sum(loss for loss in loss_dict.values())
     19 

6 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
     69             features = OrderedDict([('0', features)])
     70         proposals, proposal_losses = self.rpn(images, features, targets)
---> 71         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
     72         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)
     73 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/roi_heads.py in forward(self, features, proposals, image_shapes, targets)
    752             matched_idxs = None
    753 
--> 754         box_features = self.box_roi_pool(features, proposals, image_shapes)
    755         box_features = self.box_head(box_features)
    756         class_logits, box_regression = self.box_predictor(box_features)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torchvision/ops/poolers.py in forward(self, x, boxes, image_shapes)
    194                 output_size=self.output_size,
    195                 spatial_scale=scales[0],
--> 196                 sampling_ratio=self.sampling_ratio
    197             )
    198 

/usr/local/lib/python3.6/dist-packages/torchvision/ops/roi_align.py in roi_align(input, boxes, output_size, spatial_scale, sampling_ratio, aligned)
     43     return torch.ops.torchvision.roi_align(input, rois, spatial_scale,
     44                                            output_size[0], output_size[1],
---> 45                                            sampling_ratio, aligned)
     46 
     47 

RuntimeError: Expected tensor for argument #1 'input' to have the same type as tensor for argument #2 'rois'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for ROIAlign_forward_cuda)```

Environment

CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CMake version: version 3.12.0

Python version: 3.6 Is CUDA available: Yes CUDA runtime version: 10.1.243 GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB Nvidia driver version: 418.67 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries: [pip3] numpy==1.18.4 [pip3] torch==1.5.0+cu101 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.3.1 [pip3] torchvision==0.6.0+cu101 [conda] Could not collect

sarmientoj24 avatar May 12 '20 16:05 sarmientoj24

Did you get this error fixed? I am receiving the same runtime error. The model works perfectly when I run it on its own. I receive this runtime error only when I run it with another model simultaneously.

oooolga avatar Jun 08 '20 04:06 oooolga

not yet. no answer from apex

sarmientoj24 avatar Jun 08 '20 08:06 sarmientoj24

I am also getting the same error. Is the error fixed?

sharat29ag avatar Apr 13 '21 05:04 sharat29ag

I came across the same problem. Is there any solution way now?

RuntimeError: Expected tensor for argument #1 'grad_output' to have the same type as tensor for argument #2 'weight'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_input)

yulijun1220 avatar Nov 29 '21 13:11 yulijun1220

I also meet this question

ucasyjz avatar Apr 19 '22 08:04 ucasyjz

same problem but fixed with command .float(). This error referred to the tensor.dtype such as torch.float16 is the half tensor of tensor.float32!

iasonasxrist avatar Jun 29 '22 10:06 iasonasxrist

using scope of with autocast(enabled=True): helped me.

zhangisland avatar Sep 11 '23 09:09 zhangisland