apex
apex copied to clipboard
Expected tensor for argument #1 'input' to have the same type as tensor for argument #2 'rois'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor
Getting this while training a Faster RCNN On training process
for epoch in range(num_epochs):
model.train()
i = 0
for imgs, annotations in data_loader:
i += 1
total_processed += 1
imgs = list(img.to(device) for img in imgs)
annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
loss_dict = model(imgs, annotations)
losses = sum(loss for loss in loss_dict.values())
optimizer.zero_grad()
losses.backward()
with amp.scale_loss(losses, optimizer) as scaled_loss:
scaled_loss.backward()
optimizer.step()
I get RUNTIME Error
warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-65-8fac2bdda8e5> in <module>()
15 # imgs = torch.as_tensor(imgs, dtype=torch.float32)
16 annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
---> 17 loss_dict = model(new_imgs, annotations)
18 losses = sum(loss for loss in loss_dict.values())
19
6 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
69 features = OrderedDict([('0', features)])
70 proposals, proposal_losses = self.rpn(images, features, targets)
---> 71 detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
72 detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)
73
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/roi_heads.py in forward(self, features, proposals, image_shapes, targets)
752 matched_idxs = None
753
--> 754 box_features = self.box_roi_pool(features, proposals, image_shapes)
755 box_features = self.box_head(box_features)
756 class_logits, box_regression = self.box_predictor(box_features)
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/dist-packages/torchvision/ops/poolers.py in forward(self, x, boxes, image_shapes)
194 output_size=self.output_size,
195 spatial_scale=scales[0],
--> 196 sampling_ratio=self.sampling_ratio
197 )
198
/usr/local/lib/python3.6/dist-packages/torchvision/ops/roi_align.py in roi_align(input, boxes, output_size, spatial_scale, sampling_ratio, aligned)
43 return torch.ops.torchvision.roi_align(input, rois, spatial_scale,
44 output_size[0], output_size[1],
---> 45 sampling_ratio, aligned)
46
47
RuntimeError: Expected tensor for argument #1 'input' to have the same type as tensor for argument #2 'rois'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for ROIAlign_forward_cuda)```
Environment
CUDA used to build PyTorch: 10.1
OS: Ubuntu 18.04.3 LTS GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CMake version: version 3.12.0
Python version: 3.6 Is CUDA available: Yes CUDA runtime version: 10.1.243 GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB Nvidia driver version: 418.67 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
Versions of relevant libraries: [pip3] numpy==1.18.4 [pip3] torch==1.5.0+cu101 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.3.1 [pip3] torchvision==0.6.0+cu101 [conda] Could not collect
Did you get this error fixed? I am receiving the same runtime error. The model works perfectly when I run it on its own. I receive this runtime error only when I run it with another model simultaneously.
not yet. no answer from apex
I am also getting the same error. Is the error fixed?
I came across the same problem. Is there any solution way now?
RuntimeError: Expected tensor for argument #1 'grad_output' to have the same type as tensor for argument #2 'weight'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_input)
I also meet this question
same problem but fixed with command .float()
.
This error referred to the tensor.dtype
such as torch.float16
is the half tensor of tensor.float32
!
using scope of with autocast(enabled=True):
helped me.