blog Fine-Tune Wav2Vec2 for English ASR on GCP: RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

When trying to run the Notebook https://github.com/huggingface/blog/blob/main/notebooks/17_fine_tune_wav2vec2_for_english_asr.ipynb on a GCP Notebook instance I get the below error when calling trainer.train():

***** Running training *****
  Num examples = 4620
  Num Epochs = 30
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 4350
/opt/conda/lib/python3.7/site-packages/transformers/feature_extraction_utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  tensor = as_tensor(value)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_15719/4032920361.py in <module>
----> 1 trainer.train()

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1314                         tr_loss_step = self.training_step(model, inputs)
   1315                 else:
-> 1316                     tr_loss_step = self.training_step(model, inputs)
   1317 
   1318                 if (

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in training_step(self, model, inputs)
   1845         if self.use_amp:
   1846             with autocast():
-> 1847                 loss = self.compute_loss(model, inputs)
   1848         else:
   1849             loss = self.compute_loss(model, inputs)

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   1879         else:
   1880             labels = None
-> 1881         outputs = model(**inputs)
   1882         # Save past state if it exists
   1883         # TODO: this needs to be fixed and made cleaner later.

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values, attention_mask, output_attentions, output_hidden_states, return_dict, labels)
   1497             output_attentions=output_attentions,
   1498             output_hidden_states=output_hidden_states,
-> 1499             return_dict=return_dict,
   1500         )
   1501 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values, attention_mask, mask_time_indices, output_attentions, output_hidden_states, return_dict)
   1062         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
   1063 
-> 1064         extract_features = self.feature_extractor(input_values)
   1065         extract_features = extract_features.transpose(1, 2)
   1066 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values)
    335         hidden_states = input_values[:, None]
    336         for conv_layer in self.conv_layers:
--> 337             hidden_states = conv_layer(hidden_states)
    338 
    339         return hidden_states

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, hidden_states)
    256 
    257     def forward(self, hidden_states):
--> 258         hidden_states = self.conv(hidden_states)
    259         hidden_states = self.layer_norm(hidden_states)
    260         hidden_states = self.activation(hidden_states)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    300 
    301     def forward(self, input: Tensor) -> Tensor:
--> 302         return self._conv_forward(input, self.weight, self.bias)
    303 
    304 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
    297                             _single(0), self.dilation, self.groups)
    298         return F.conv1d(input, weight, bias, self.stride,
--> 299                         self.padding, self.dilation, self.groups)
    300 
    301     def forward(self, input: Tensor) -> Tensor:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

CUDA is enabled and model is successfully loaded onto the GPU:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P0    27W /  70W |   1370MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     15719      C   /opt/conda/bin/python            1367MiB |
+-----------------------------------------------------------------------------+

Appreciate any help!

Apr 01 '22 11:04 sully90

cc @patrickvonplaten

Apr 01 '22 14:04 osanseviero

Hey @sully90,

The notebook works on the colab for me, but I haven't tested it on GCP.

From the error message, it looks like there is a problem with fp16 - could you in a first step maybe try to disable fp16? E.g. remove the:

fp16=True.

statement?

Apr 06 '22 10:04 patrickvonplaten

Converted notebook to .py file I am also facing same issue. Tried removing fp16= True but the issue persists. @patrickvonplaten plzz help to solve this issue

Apr 20 '22 18:04 elites2k19

@sully90 Did you solve this issue?

Apr 20 '22 18:04 elites2k19

Could you guys make me a reproducible colab so that I can reproduce the error? :-) This would be great!

Apr 20 '22 19:04 patrickvonplaten

I have the same issue. It worked a couple of days ago with no changes done to the code.

Apr 27 '22 12:04 ericjohansson91

Getting same issue on Colab without any changes to notebook -- ie: issue on original notebook. Sharing notebook https://colab.research.google.com/drive/18uGFjmoTVEKDI-2Nd9kgoSwQG4H-0Pzx?usp=sharing

May 05 '22 07:05 ghost

Hey Getting the same issue when running the standard notebook:

https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb#scrollTo=_UEjJqGsQw24

Please assist

May 05 '22 07:05 Jesse-Parvess

I can reproduce now! Thanks for telling me!

May 05 '22 18:05 patrickvonplaten

Not 100% sure what the error is for now- will take a look tomorrow!

May 05 '22 18:05 patrickvonplaten

Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb

Can you try it out ? :-)

May 10 '22 13:05 patrickvonplaten

Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb

Can you try it out ? :-)

Works now; thanks so much @patrickvonplaten; your contribution to the open source asr community is outstanding

May 10 '22 14:05 ghost

Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb

Can you try it out ? :-)

Hey, could explain exactly what the problem was? I suddenly received the same error with no changes to the code, so I am wondering if the same changes were reproducible for my wav2vec project as well. I'm not sure if this was caused by a recent update.

May 13 '22 15:05 jovan3600

The problem was that the Transformers version that was used was too old. Didn't dive super deep into it though. Maybe updating your Transformers version should do the trick @jovan3600 ?

May 16 '22 02:05 patrickvonplaten

The problem was that the Transformers version that was used was too old. Didn't dive super deep into it though. Maybe updating your Transformers version should do the trick @jovan3600 ?

Yeah I tried that but unfortunately it didn't change anything. I'm not sure what else could be the problem. All notebooks I made that use wav2vec have the same error now :(

May 16 '22 07:05 jovan3600

Hmmm, not really sure what could be the problem. In the new Transformers versions > 4.17 whenever the runtime is set to GPU, which can be checked with torch.cuda.is_available() then the Trainer should automatically put both the inputs and the model on GPU. Could you maybe put torch.cuda.is_available() statements before the bug and see what they give?

May 16 '22 19:05 patrickvonplaten

Hi Patrick. I have the same problem. I tried to update both libraries transformers and datasets to the latest version and tried to add the statement "if torch.cuda.is_available()". But I still receive the same error because CUDA is available. Are there other ways to solve the problem?

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Is there a way to put the "input" to cuda?

Jan 26 '23 10:01 sofidipace

Gently ping @sanchit-gandhi

Jan 26 '23 18:01 patrickvonplaten

Hey @sofidipace! Could you please share:

Your transformers + datasets version (run !transformers-cli env from a Colab cell)
A reproducible notebook / codesnippet (if possible!)

Jan 30 '23 09:01 sanchit-gandhi

Hey @sanchit-gandhi of course! Versions that I installed (latest ones, but I got the same error with previous versions as well) Transformer version: 4.26.0 datasets version: 2.8.0 Hier the code: https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/speech_recognition.ipynb#scrollTo=tborvC9hx88e

Hier !transformers-cli env

Hier the snippet:

import torch

from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union

@dataclass
class DataCollatorCTCWithPadding:
    processor: Wav2Vec2Processor
    padding: Union[bool, str] = True
    max_length: Optional[int] = None
    max_length_labels: Optional[int] = None
    pad_to_multiple_of: Optional[int] = None
    pad_to_multiple_of_labels: Optional[int] = None

    def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
        # split inputs and labels since they have to be of different lenghts and need
        # different padding methods
        input_features = [{"input_values": feature["input_values"]} for feature in features] #list
        
        label_features = [{"input_ids": feature["labels"]} for feature in features]

        batch = self.processor.pad(
            input_features,
            padding=self.padding,
            max_length=self.max_length,
            pad_to_multiple_of=self.pad_to_multiple_of,
            return_tensors="pt",
        )
        with self.processor.as_target_processor():
            labels_batch = self.processor.pad(
                label_features,
                padding=self.padding,
                max_length=self.max_length_labels,
                pad_to_multiple_of=self.pad_to_multiple_of_labels,
                return_tensors="pt",
            )

        # replace padding with -100 to ignore loss correctly
        labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)

        batch["labels"] = labels
        
        return batch

data_collator = DataCollatorCTCWithPadding(processor=processor, padding=True)
wer_metric = load_metric("wer")


def compute_metrics(pred):
    pred_logits = pred.predictions
    pred_ids = np.argmax(pred_logits, axis=-1)

    pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id

    pred_str = processor.batch_decode(pred_ids)
    # we do not want to group tokens when computing the metrics
    label_str = processor.batch_decode(pred.label_ids, group_tokens=False)

    wer = wer_metric.compute(predictions=pred_str, references=label_str)

    return {"wer": wer}

from transformers import AutoModelForCTC

model = AutoModelForCTC.from_pretrained(
    model_checkpoint, 
    ctc_loss_reduction="mean", 
    pad_token_id=processor.tokenizer.pad_token_id,
)

from transformers import TrainingArguments

training_args = TrainingArguments(
  output_dir=repo_name,
  group_by_length=True,
  per_device_train_batch_size=32,
  evaluation_strategy="steps",
  num_train_epochs=30,
  fp16=True,
  gradient_checkpointing=True,
  save_steps=500,
  eval_steps=500,
  logging_steps=500,
  learning_rate=1e-4,
  weight_decay=0.005,
  warmup_steps=1000,
  save_total_limit=2,
  push_to_hub=True,
)

from transformers import Trainer

trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=timit["train"],
    eval_dataset=timit["test"],
    tokenizer=processor.feature_extractor,
)

if torch.cuda.is_available():
  trainer.train() <------ ERROR

Jan 30 '23 21:01 sofidipace

I just got it to run. I just commented out the versions !pip install datasets ~~==1.14~~ !pip install transformers ~~==4.11.3~~

and got me a huggingface write role token

Feb 02 '23 09:02 sofidipace

And fyi you now have to download timit manually ;)

Feb 02 '23 09:02 sofidipace

Hey @sofidipace - thanks for sharing your code! Confirming that you are able to run the notebook by commenting out the pinned transformers/datasets versions?

Feb 02 '23 10:02 sanchit-gandhi

well, I got the problem with downloading timit. Thats why I switched to https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLS_R_on_Common_Voice.ipynb#scrollTo=9fRr9TG5pGBl

Feb 02 '23 10:02 sofidipace

Cool! There are over 150 datasets on the Hub you can use for ASR: https://huggingface.co/datasets?task_categories=task_categories:automatic-speech-recognition&sort=downloads

You can just change the dataset id in the load_dataset function to whichever dataset you prefer 🚀

I would personally recommend Common Voice 11: it builds on the original common voice corpus with more data and speakers per language

You just need to agree to the terms of use on the Hub: https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0

And add use_auth_token=True to load_dataset:

common_voice_train = load_dataset("mozilla-foundation/common_voice_11_0", "tr", split="train+validation", use_auth_token=True)
common_voice_test = load_dataset("mozilla-foundation/common_voice_11_0", "tr", split="test", use_auth_token=True)

Feb 02 '23 10:02 sanchit-gandhi

Thank you very much @sanchit-gandhi

Feb 02 '23 11:02 sofidipace

FYI

And add use_auth_token=True to load_dataset:

This is not required anymore, this is retrieved automatically if you have logged in with huggingface-cli login

Feb 02 '23 11:02 osanseviero