ast icon indicating copy to clipboard operation
ast copied to clipboard

Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Open michelle-chou25 opened this issue 2 years ago • 15 comments

1666232927523

Dear Yuan,

I met this issue when running the demo.py, it occurred in line 29, ast_models.py, self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size) with error msg as followed: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor. Would you like to have a look at it? I use 👍 timm=0.4.5 torch = 1.10.1+cu102
torchaudio = 0.10.1+cu102
torchvision = 0.11.2+cu102

Thank you Best Regards, Nanjun

michelle-chou25 avatar Oct 20 '22 02:10 michelle-chou25

Hi Nanjun,

This typically means your input and model are not on the same device (i.e., one on CPU, another on GPU), which can be solved by

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
input = model.to(device)

May I ask which demo script you are running? We have a colab demo at https://colab.research.google.com/github/YuanGongND/ast/blob/master/colab/AST_Inference_Demo.ipynb, which should be bug-free.

-Yuan

YuanGongND avatar Oct 20 '22 03:10 YuanGongND

Dear Yuan,

The file I run was src/demo.py, I also run the jupyter notbook demo and didn't have this issue. I debug the code, in self.proj(x), x.mlp_head.weight is in cuda, but when self.proj(x) is executed thiserror occurrs.

Best Regards, Nanjun

michelle-chou25 avatar Oct 20 '22 05:10 michelle-chou25

Dear Yuan,

The file I run was src/demo.py, I also run the jupyter notbook demo and didn't have this issue. I debug the code, in self.proj(x), x.mlp_head.weight is in cuda, but when self.proj(x) is executed thiserror occurrs.

Best Regards, Nanjun

And this error is still there after I set both the model and input to cuda. I'll check it again by change cudatoolkit to another version.

michelle-chou25 avatar Oct 20 '22 05:10 michelle-chou25

What if you run the jupyter script with your environment instead of the Google Colab one? If no error, then it's not your environment's problem.

YuanGongND avatar Oct 20 '22 05:10 YuanGongND

I also think setting the pretrain flag could also help:

ast_mdl = ASTModel(label_dim=label_dim, input_tdim=input_tdim, imagenet_pretrain=False, audioset_pretrain=False)

YuanGongND avatar Oct 20 '22 05:10 YuanGongND

I failed to run the Jupiter script on my local machine, it said it can't find the path '/content/ast/', seems my IDE failed to connect to colab.

michelle-chou25 avatar Oct 20 '22 06:10 michelle-chou25

Yes, you need to change the filepath and maybe something else to run on local machine.

YuanGongND avatar Oct 20 '22 06:10 YuanGongND

Thank you it solves the issue and may I know why? I also tried changing x to x.half(), a different error msg as followed occurred: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper__thnn_conv2d_forward)

michelle-chou25 avatar Oct 20 '22 06:10 michelle-chou25

I think this again means your input and model are not in the same device. Which specific method solved your issue?

YuanGongND avatar Oct 20 '22 06:10 YuanGongND

I think this again means your input and model are not in the same device. Which specific method solved your issue? Disable both imigenet_pretrained and audioset_pretrained

michelle-chou25 avatar Oct 20 '22 07:10 michelle-chou25

The reason is it avoids the pretrained weights being load to cpu. No one reported this issue before about the input/model device, maybe not many people actually ran this demo. But since you have GPU, you could try run ESC-50 recipe and see if the error still there. I don't think cuda/torch version is the problem.

YuanGongND avatar Oct 20 '22 07:10 YuanGongND

I tried it on another machine. it was not reproduced.

michelle-chou25 avatar Oct 20 '22 08:10 michelle-chou25

But changed line 132 in ast_models.py to
self.mlp_head = nn.Sequential(nn.LayerNorm(self.original_embedding_dim), nn.Linear(self.original_embedding_dim, label_dim)).to("cuda") and line 18 in demo.py to test_input = torch.rand([10, input_tdim, 128]).to("cuda").half()

michelle-chou25 avatar Oct 20 '22 08:10 michelle-chou25

But changed line 132 in ast_models.py to self.mlp_head = nn.Sequential(nn.LayerNorm(self.original_embedding_dim), nn.Linear(self.original_embedding_dim, label_dim)).to("cuda") and line 18 in demo.py to test_input = torch.rand([10, input_tdim, 128]).to("cuda").half()

In the previous machine, the error can still be reproduced by applying the workaround.

michelle-chou25 avatar Oct 20 '22 08:10 michelle-chou25

I see, it is a bit weird to me. Thanks for reporting this.

I actually don't think .half() is needed though the model is trained with half-precision - it should work for all float tensor input. You can do a quick test in the Google Colab environment to see if it is true.

Let's see if anyone else has the same issue.

YuanGongND avatar Oct 20 '22 08:10 YuanGongND