ncnet icon indicating copy to clipboard operation
ncnet copied to clipboard

Runtime Error (GPU issue) even when cuda is available.

Open Rohit-Kundu opened this issue 3 years ago • 1 comments

I keep getting this error:

/content/drive/MyDrive/Rohit/Work/ncnet-master
ImMatchNet training script
Namespace(batch_size=16, checkpoint='', dataset_csv_path='datasets/pf-pascal/image_pairs/', dataset_image_path='datasets/pf-pascal', fe_finetune_params=0, image_size=400, lr=0.0005, ncons_channels=[16, 16, 1], ncons_kernel_sizes=[5, 5, 5], num_epochs=5, result_model_dir='trained_models', result_model_fn='checkpoint_adam')
Creating CNN model...
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100% 171M/171M [00:00<00:00, 195MB/s]
Trainable parameters:
1: torch.Size([5, 16, 1, 5, 5, 5])
2: torch.Size([16])
3: torch.Size([5, 16, 16, 5, 5, 5])
4: torch.Size([16])
5: torch.Size([5, 1, 16, 5, 5, 5])
6: torch.Size([1])
using Adam optimizer
Checkpoint name: trained_models/2021-07-22_06:15_checkpoint_adam.pth.tar
Starting training...
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:4044: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
  "Default grid_sample and affine_grid behavior has changed "
Traceback (most recent call last):
  File "train.py", line 281, in <module>
    log_interval=1,
  File "train.py", line 240, in process_epoch
    loss = loss_fn(model, batch)
  File "train.py", line 221, in <lambda>
    loss_fn = lambda model, batch: weak_loss(model, batch, normalization="softmax")
  File "train.py", line 171, in weak_loss
    corr4d = model(batch)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/drive/MyDrive/Rohit/Work/ncnet-master/lib/model.py", line 263, in forward
    feature_A = self.FeatureExtraction(tnf_batch['source_image'])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/drive/MyDrive/Rohit/Work/ncnet-master/lib/model.py", line 84, in forward
    features = self.model(image_batch)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

I have correctly downloaded the data and all pre-requisite files. I get the same error even when I download the trained models and save them in the specified folder. I am using it on Google Colab and the GPU is activated (checked torch.cuda.is_available() ). I had also tried changing use_cuda=True everywhere, but still, I get the same error.

Rohit-Kundu avatar Jul 22 '21 06:07 Rohit-Kundu

in model.py line 84
you need add image_batch = image_batch.cuda() # modify here

Acmenwangtuo avatar Dec 12 '21 12:12 Acmenwangtuo