Fine-Tune Wav2Vec2 for English ASR on GCP: RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
When trying to run the Notebook https://github.com/huggingface/blog/blob/main/notebooks/17_fine_tune_wav2vec2_for_english_asr.ipynb on a GCP Notebook instance I get the below error when calling trainer.train():
***** Running training *****
Num examples = 4620
Num Epochs = 30
Instantaneous batch size per device = 32
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 1
Total optimization steps = 4350
/opt/conda/lib/python3.7/site-packages/transformers/feature_extraction_utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
tensor = as_tensor(value)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_15719/4032920361.py in <module>
----> 1 trainer.train()
/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1314 tr_loss_step = self.training_step(model, inputs)
1315 else:
-> 1316 tr_loss_step = self.training_step(model, inputs)
1317
1318 if (
/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in training_step(self, model, inputs)
1845 if self.use_amp:
1846 with autocast():
-> 1847 loss = self.compute_loss(model, inputs)
1848 else:
1849 loss = self.compute_loss(model, inputs)
/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
1879 else:
1880 labels = None
-> 1881 outputs = model(**inputs)
1882 # Save past state if it exists
1883 # TODO: this needs to be fixed and made cleaner later.
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values, attention_mask, output_attentions, output_hidden_states, return_dict, labels)
1497 output_attentions=output_attentions,
1498 output_hidden_states=output_hidden_states,
-> 1499 return_dict=return_dict,
1500 )
1501
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values, attention_mask, mask_time_indices, output_attentions, output_hidden_states, return_dict)
1062 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1063
-> 1064 extract_features = self.feature_extractor(input_values)
1065 extract_features = extract_features.transpose(1, 2)
1066
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values)
335 hidden_states = input_values[:, None]
336 for conv_layer in self.conv_layers:
--> 337 hidden_states = conv_layer(hidden_states)
338
339 return hidden_states
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, hidden_states)
256
257 def forward(self, hidden_states):
--> 258 hidden_states = self.conv(hidden_states)
259 hidden_states = self.layer_norm(hidden_states)
260 hidden_states = self.activation(hidden_states)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
300
301 def forward(self, input: Tensor) -> Tensor:
--> 302 return self._conv_forward(input, self.weight, self.bias)
303
304
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
297 _single(0), self.dilation, self.groups)
298 return F.conv1d(input, weight, bias, self.stride,
--> 299 self.padding, self.dilation, self.groups)
300
301 def forward(self, input: Tensor) -> Tensor:
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
CUDA is enabled and model is successfully loaded onto the GPU:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 48C P0 27W / 70W | 1370MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 15719 C /opt/conda/bin/python 1367MiB |
+-----------------------------------------------------------------------------+
Appreciate any help!
cc @patrickvonplaten
Hey @sully90,
The notebook works on the colab for me, but I haven't tested it on GCP.
From the error message, it looks like there is a problem with fp16 - could you in a first step maybe try to disable fp16?
E.g. remove the:
fp16=True.
statement?
Converted notebook to .py file I am also facing same issue. Tried removing fp16= True but the issue persists. @patrickvonplaten plzz help to solve this issue
@sully90 Did you solve this issue?
Could you guys make me a reproducible colab so that I can reproduce the error? :-) This would be great!
I have the same issue. It worked a couple of days ago with no changes done to the code.
Getting same issue on Colab without any changes to notebook -- ie: issue on original notebook. Sharing notebook https://colab.research.google.com/drive/18uGFjmoTVEKDI-2Nd9kgoSwQG4H-0Pzx?usp=sharing
Hey Getting the same issue when running the standard notebook:
https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb#scrollTo=_UEjJqGsQw24
Please assist
I can reproduce now! Thanks for telling me!
Not 100% sure what the error is for now- will take a look tomorrow!
Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb
Can you try it out ? :-)
Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb
Can you try it out ? :-)
Works now; thanks so much @patrickvonplaten; your contribution to the open source asr community is outstanding
Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb
Can you try it out ? :-)
Hey, could explain exactly what the problem was? I suddenly received the same error with no changes to the code, so I am wondering if the same changes were reproducible for my wav2vec project as well. I'm not sure if this was caused by a recent update.
The problem was that the Transformers version that was used was too old. Didn't dive super deep into it though. Maybe updating your Transformers version should do the trick @jovan3600 ?
The problem was that the Transformers version that was used was too old. Didn't dive super deep into it though. Maybe updating your Transformers version should do the trick @jovan3600 ?
Yeah I tried that but unfortunately it didn't change anything. I'm not sure what else could be the problem. All notebooks I made that use wav2vec have the same error now :(
Hmmm, not really sure what could be the problem. In the new Transformers versions > 4.17 whenever the runtime is set to GPU, which can be checked with torch.cuda.is_available() then the Trainer should automatically put both the inputs and the model on GPU. Could you maybe put torch.cuda.is_available() statements before the bug and see what they give?
Hi Patrick. I have the same problem. I tried to update both libraries transformers and datasets to the latest version and tried to add the statement "if torch.cuda.is_available()". But I still receive the same error because CUDA is available. Are there other ways to solve the problem?
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
Is there a way to put the "input" to cuda?
Gently ping @sanchit-gandhi
Hey @sofidipace! Could you please share:
- Your transformers + datasets version (run
!transformers-cli envfrom a Colab cell) - A reproducible notebook / codesnippet (if possible!)
Hey @sanchit-gandhi of course! Versions that I installed (latest ones, but I got the same error with previous versions as well) Transformer version: 4.26.0 datasets version: 2.8.0 Hier the code: https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/speech_recognition.ipynb#scrollTo=tborvC9hx88e
Hier !transformers-cli env
Hier the snippet:
import torch
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union
@dataclass
class DataCollatorCTCWithPadding:
processor: Wav2Vec2Processor
padding: Union[bool, str] = True
max_length: Optional[int] = None
max_length_labels: Optional[int] = None
pad_to_multiple_of: Optional[int] = None
pad_to_multiple_of_labels: Optional[int] = None
def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
# split inputs and labels since they have to be of different lenghts and need
# different padding methods
input_features = [{"input_values": feature["input_values"]} for feature in features] #list
label_features = [{"input_ids": feature["labels"]} for feature in features]
batch = self.processor.pad(
input_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors="pt",
)
with self.processor.as_target_processor():
labels_batch = self.processor.pad(
label_features,
padding=self.padding,
max_length=self.max_length_labels,
pad_to_multiple_of=self.pad_to_multiple_of_labels,
return_tensors="pt",
)
# replace padding with -100 to ignore loss correctly
labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)
batch["labels"] = labels
return batch
data_collator = DataCollatorCTCWithPadding(processor=processor, padding=True)
wer_metric = load_metric("wer")
def compute_metrics(pred):
pred_logits = pred.predictions
pred_ids = np.argmax(pred_logits, axis=-1)
pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
pred_str = processor.batch_decode(pred_ids)
# we do not want to group tokens when computing the metrics
label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
wer = wer_metric.compute(predictions=pred_str, references=label_str)
return {"wer": wer}
from transformers import AutoModelForCTC
model = AutoModelForCTC.from_pretrained(
model_checkpoint,
ctc_loss_reduction="mean",
pad_token_id=processor.tokenizer.pad_token_id,
)
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir=repo_name,
group_by_length=True,
per_device_train_batch_size=32,
evaluation_strategy="steps",
num_train_epochs=30,
fp16=True,
gradient_checkpointing=True,
save_steps=500,
eval_steps=500,
logging_steps=500,
learning_rate=1e-4,
weight_decay=0.005,
warmup_steps=1000,
save_total_limit=2,
push_to_hub=True,
)
from transformers import Trainer
trainer = Trainer(
model=model,
data_collator=data_collator,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=timit["train"],
eval_dataset=timit["test"],
tokenizer=processor.feature_extractor,
)
if torch.cuda.is_available():
trainer.train() <------ ERROR
I just got it to run. I just commented out the versions !pip install datasets ~~==1.14~~ !pip install transformers ~~==4.11.3~~
and got me a huggingface write role token
And fyi you now have to download timit manually ;)
Hey @sofidipace - thanks for sharing your code! Confirming that you are able to run the notebook by commenting out the pinned transformers/datasets versions?
well, I got the problem with downloading timit. Thats why I switched to https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLS_R_on_Common_Voice.ipynb#scrollTo=9fRr9TG5pGBl
Cool! There are over 150 datasets on the Hub you can use for ASR: https://huggingface.co/datasets?task_categories=task_categories:automatic-speech-recognition&sort=downloads
You can just change the dataset id in the load_dataset function to whichever dataset you prefer 🚀
I would personally recommend Common Voice 11: it builds on the original common voice corpus with more data and speakers per language
You just need to agree to the terms of use on the Hub: https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0
And add use_auth_token=True to load_dataset:
common_voice_train = load_dataset("mozilla-foundation/common_voice_11_0", "tr", split="train+validation", use_auth_token=True)
common_voice_test = load_dataset("mozilla-foundation/common_voice_11_0", "tr", split="test", use_auth_token=True)
Thank you very much @sanchit-gandhi
FYI
And add use_auth_token=True to load_dataset:
This is not required anymore, this is retrieved automatically if you have logged in with huggingface-cli login