Fine-tuning whisper RuntimeError
my codes:
preprocessing_only=False
do_lower_case = False
do_remove_punctuation = False
max_input_length = 30.0
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-large-v2")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2", load_in_8bit=True, device_map="auto")
device_map = model.hf_device_map.copy()
device_map["model.decoder.embed_tokens"] = model._hf_hook.execution_device
device_map["model.decoder.embed_positions"] = model._hf_hook.execution_device
device_map["proj_out"] = model._hf_hook.execution_device
dispatch_model(model, device_map=device_map)
model.hf_device_map
model.config.suppress_tokens = []
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
model = prepare_model_for_int8_training(model, output_embedding_layer_name="proj_out")
metric = evaluate.load("cer")
config = LoraConfig(r=32,
lora_alpha=64,
target_modules=".*decoder.*(self_attn|encoder_attn).*(q_proj|v_proj)$",#["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none")
model = get_peft_model(model, config)
model.print_trainable_parameters()
errors:
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/ybZhang/miniconda3/envs/whister did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 114
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
trainable params: 10485760 || all params: 1553790720 || trainable%: 0.6748502140622902
{'model.encoder': 0, 'model.decoder.embed_tokens': 0, 'proj_out': 0, 'model.decoder.embed_positions': 0, 'model.decoder.layers.0': 0, 'model.decoder.layers.1': 0, 'model.decoder.layers.2': 0, 'model.decoder.layers.3': 0, 'model.decoder.layers.4': 0, 'model.decoder.layers.5.self_attn': 0, 'model.decoder.layers.5.activation_fn': 0, 'model.decoder.layers.5.self_attn_layer_norm': 0, 'model.decoder.layers.5.encoder_attn.k_proj': 0, 'model.decoder.layers.5.encoder_attn.v_proj': 0, 'model.decoder.layers.5.encoder_attn.q_proj': 0, 'model.decoder.layers.5.encoder_attn.out_proj': 1, 'model.decoder.layers.5.encoder_attn_layer_norm': 1, 'model.decoder.layers.5.fc1': 1, 'model.decoder.layers.5.fc2': 1, 'model.decoder.layers.5.final_layer_norm': 1, 'model.decoder.layers.6': 1, 'model.decoder.layers.7': 1, 'model.decoder.layers.8': 1, 'model.decoder.layers.9': 1, 'model.decoder.layers.10': 1, 'model.decoder.layers.11': 1, 'model.decoder.layers.12': 1, 'model.decoder.layers.13': 1, 'model.decoder.layers.14': 1, 'model.decoder.layers.15': 1, 'model.decoder.layers.16': 1, 'model.decoder.layers.17': 1, 'model.decoder.layers.18': 1, 'model.decoder.layers.19': 1, 'model.decoder.layers.20': 1, 'model.decoder.layers.21': 1, 'model.decoder.layers.22': 1, 'model.decoder.layers.23': 1, 'model.decoder.layers.24': 1, 'model.decoder.layers.25': 1, 'model.decoder.layers.26': 1, 'model.decoder.layers.27': 1, 'model.decoder.layers.28': 1, 'model.decoder.layers.29': 1, 'model.decoder.layers.30': 1, 'model.decoder.layers.31': 1, 'model.decoder.layer_norm': 1}
<datasets.iterable_dataset.IterableDataset object at 0x7f9ed6d4a0a0>
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
0%| | 0/1500 [00:00<?, ?it/s]Traceback (most recent call last):
File "finetune.py", line 176, in <module>
whisper_finetune(traindir,devdir,outdir)
File "finetune.py", line 171, in whisper_finetune
trainer.train()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2645, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2677, in compute_loss
outputs = model(**inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 157, in forward
raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
0%| |
update transformers from 4.27 to version 4.28.0.dev0 get follows error:
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
0%| | 0/1500 [00:00<?, ?it/s]/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
File "finetune.py", line 176, in <module>
whisper_finetune(traindir,devdir,outdir)
File "finetune.py", line 171, in whisper_finetune
trainer.train()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1659, in train
return inner_training_loop(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1926, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2706, in training_step
self.scaler.scale(loss).backward()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 456, in backward
grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
RuntimeError: expected scalar type Half but found Float
0%|
Hi @v-yunbin Can you share the full training script with us? Thanks!
Downgraded to transformers 4.27 train codes:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
do_lower_case = False
do_remove_punctuation = False
max_input_length = 30.0
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-large-v2")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2", load_in_8bit=True, device_map={"":0})
device_map = model.hf_device_map.copy()
device_map["model.decoder.embed_tokens"] = model._hf_hook.execution_device
device_map["model.decoder.embed_positions"] = model._hf_hook.execution_device
device_map["proj_out"] = model._hf_hook.execution_device
dispatch_model(model, device_map=device_map)
model.hf_device_map
model.config.suppress_tokens = []
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
model = prepare_model_for_int8_training(model, output_embedding_layer_name="proj_out")
metric = evaluate.load("cer")
config = LoraConfig(r=32,
lora_alpha=64,
target_modules=".*decoder.*(self_attn|encoder_attn).*(q_proj|v_proj)$",#["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none")
model = get_peft_model(model, config)
model.print_trainable_parameters()
print(model.hf_device_map)
def main():
we_voice = asr_dataset(traindir,devdir)
we_voices = we_voice.map(prepare_dataset,
remove_columns=list(next(iter(we_voice.values())).features)).with_format("torch")
we_voices['train'] = we_voices['train'].shuffle(buffer_size=500,seed=0,)
we_voices["train"] = we_voices["train"].filter(is_audio_in_length_range,input_columns=["input_length"],)
data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)
training_args = Seq2SeqTrainingArguments(
output_dir=outdir, # change to a repo name of your choice
per_device_train_batch_size=8,
gradient_accumulation_steps=1, # increase by 2x for every 2x decrease in batch size
learning_rate=1e-5,
warmup_steps=500,
num_train_epochs=10,
max_steps=1500,
gradient_checkpointing=True,
fp16=True,
evaluation_strategy="epoch",
per_device_eval_batch_size=8,
#predict_with_generate=True,
generation_max_length=225,
logging_steps=25,
report_to=["tensorboard"],
#load_best_model_at_end=True,
metric_for_best_model="cer",
greater_is_better=False,
push_to_hub=False,
remove_unused_columns=False,
label_names=["labels"],)
class SavePeftModelCallback(TrainerCallback):
def on_save(self,args: TrainingArguments,state: TrainerState,control: TrainerControl, **kwargs,):
checkpoint_folder = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")
peft_model_path = os.path.join(checkpoint_folder, "adapter_model")
kwargs["model"].save_pretrained(peft_model_path)
pytorch_model_path = os.path.join(checkpoint_folder, "pytorch_model.bin")
if os.path.exists(pytorch_model_path):
os.remove(pytorch_model_path)
return control
trainer = Seq2SeqTrainer(
args=training_args,
model=model,
train_dataset=we_voices['train'],
eval_dataset=we_voices['test'],
data_collator=data_collator,
#compute_metrics=compute_metrics,
tokenizer=processor.feature_extractor,
callbacks=[SavePeftModelCallback],
)
model.config.use_cache = False
trainer.train()
Modify code WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2", load_in_8bit=True, device_map="auto" as WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2", load_in_8bit=True, device_map={"":0}), but get follow error:
Traceback (most recent call last):
File "finetune.py", line 172, in <module>
whisper_finetune(traindir,devdir,outdir)
File "finetune.py", line 167, in whisper_finetune
trainer.train()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2645, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2677, in compute_loss
outputs = model(**inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/peft/peft_model.py", line 295, in forward
return self.get_base_model()(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 1409, in forward
outputs = self.model(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 1258, in forward
encoder_outputs = self.encoder(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 849, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 845, in custom_forward
return module(*inputs, output_attentions)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 424, in forward
hidden_states, attn_weights, _ = self.self_attn(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 309, in forward
value_states = self._shape(self.v_proj(hidden_states), -1, bsz)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
output += torch.matmul(subA, state.subB)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (12000x1 and 2x1280)
0%|
Have a similar issue. When I try to specify CUDA_VISIBLE_DEVICES=0 python3 scripts/train.py I get exactly the same error:
RuntimeError: expected scalar type Half but found Float
set device_map={"":0} can solve it , but met new error :RuntimeError: mat1 and mat2 shapes cannot be multiplied (12000x1 and 2x1280)
Yes, in my case it gives a similar issue:
"RuntimeError: mat1 and mat2 shapes cannot be multiplied (1544x4 and 2x4096)"
this is the issue: File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply. You cannot apply data parallel when using accelerate's device_map. You need to make sure the notebook or the script only has access to a single GPU.
Alternatively, please set the below to avoid trainer using DP as this is naive pp => model parallelism:
setattr(model, 'model_parallel', True)
setattr(model, 'is_parallelizable', True)
Note that the fix for this is already in transformers, so you can also do:
pip install git+https://github.com/huggingface/transformers.git
@pacman100 As I mentioned in the ticket (P.S. section) I already use the latest of the transformers and I also tried to set these attributes. That results in RuntimeError: expected scalar type Half but found Float.

whether someone has run the whisper funetuning recipe successfully?
Hi @v-yunbin @nd7141
Can you share more details about your environment, what are yourpeft, accelerate & bitsandbytes versions?
Hi @v-yunbin @nd7141 Can you share more details about your environment, what are your
peft,accelerate&bitsandbytesversions?
peft 0.2.0
accelerate 0.18.0
bitsandbytes 0.37.2
@v-yunbin can you use the main branch of peft and let us know if it works?
pip install git+https://github.com/huggingface/peft.git
@v-yunbin @nd7141
I managed to reproduce this issue: https://github.com/huggingface/peft/issues/269#issuecomment-1498776567 in a multi-GPU env, however the script works fine when I run it with CUDA_VISIBLE_DEVICES=0 xxx
Here are more details about my env:
accelerate=='0.18.0.dev0'
transformers=='4.28.0.dev0'
peft=='0.3.0.dev0'
Can you double check by installing all these libraries from source?
pip install git+https://github.com/huggingface/peft.git
pip install git+https://github.com/huggingface/accelerate.git
pip install git+https://github.com/huggingface/transformers.git
I follow your steps and install all from source but it still not work for me.
accelerate ==0.18.0.dev0
transformers==4.28.0.dev0
peft== 0.3.0.dev0
Hi @v-yunbin @nd7141
I managed to fix the issue that you have described: https://github.com/huggingface/peft/issues/269#issuecomment-1498795345
This was due to 8bit models converted to DistributedDataParallel, I can confirm the tests pass with that fix
You can install it right now with
pip install git+https://github.com/huggingface/transformers.git
Related: https://github.com/huggingface/transformers/pull/22628
pip install git+https://github.com/huggingface/transformers.git
@younesbelkada it still not work. my codes is as follows:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
dummy_accelerator = Accelerator()
current_device = dummy_accelerator.process_index
device_map={"":current_device}
do_lower_case = False
do_remove_punctuation = False
max_input_length = 30.0
metric = evaluate.load("cer")
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-large-v2")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2", load_in_8bit=True, device_map=device_map)
device_map = model.hf_device_map.copy()
device_map["model.decoder.embed_tokens"] = model._hf_hook.execution_device
device_map["model.decoder.embed_positions"] = model._hf_hook.execution_device
device_map["proj_out"] = model._hf_hook.execution_device
dispatch_model(model, device_map=device_map)
model.hf_device_map
model.config.suppress_tokens = []
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
#model = prepare_model_for_int8_training(model, output_embedding_layer_name="proj_out")
model = prepare_model_for_int8_training(model, output_embedding_layer_name="proj_out", layer_norm_names=[])
config = LoraConfig(r=32,
lora_alpha=64,
target_modules=".*decoder.*(self_attn|encoder_attn).*(q_proj|v_proj)$",#["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none")
model = get_peft_model(model, config)
model.print_trainable_parameters()
detail error:
Traceback (most recent call last):
File "finetune.py", line 180, in <module>
whisper_finetune(traindir,devdir,outdir)
File "finetune.py", line 175, in whisper_finetune
trainer.train()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1659, in train
return inner_training_loop(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1926, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2696, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2728, in compute_loss
outputs = model(**inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/peft/peft_model.py", line 316, in forward
return self.get_base_model()(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 1414, in forward
outputs = self.model(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 1263, in forward
encoder_outputs = self.encoder(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 851, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 847, in custom_forward
return module(*inputs, output_attentions)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 426, in forward
hidden_states, attn_weights, _ = self.self_attn(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py", line 285, in forward
query_states = self.q_proj(hidden_states) * self.scaling
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
output += torch.matmul(subA, state.subB)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (6000x4 and 3x1280)
In the traceback I can see
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
Which means that the DistributedDataParallel is still called. Can you make sure that the commit https://github.com/huggingface/transformers/pull/22628 is indeed inside your transformers package? Can you maybe uninstall transformers and re-install it again with the command I shared?
In the traceback I can see
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward outputs = self.parallel_apply(replicas, inputs, kwargs)Which means that the
DistributedDataParallelis still called. Can you make sure that the commit huggingface/transformers#22628 is indeed inside yourtransformerspackage? Can you maybe uninstalltransformersand re-install it again with the command I shared?
uninstall transformers and re-install it again, the error disappear but get previous mistakes:
Traceback (most recent call last):
File "finetune.py", line 180, in <module>
whisper_finetune(traindir,devdir,outdir)
File "finetune.py", line 175, in whisper_finetune
trainer.train()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2709, in training_step
self.scaler.scale(loss).backward()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 456, in backward
grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
RuntimeError: expected scalar type Half but found Float
Strangely I can't reproduce, can you do the same thing with peft? i.e. uninstall peft and re-install it with:
pip install git+https://github.com/huggingface/peft
uninstall peft、accelerate 、transfomrers ,reinstall all them, still get same errors, does this have anything to do with bitsandbytes version(my version is 0.37.2)?
@younesbelkada set load_in_8bit=False, it works, why?
I can confirm that it only works load_in_8bit=False.
I can confirm that it only works load_in_8bit=False
me too, but now it cannot fit into colab free t4 gpu