whisperX
whisperX copied to clipboard
AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate'
Hello, I have simple project testing out whisperx, the test script
import json
import logging
import whisperx
model_opts = {
"whisper_arch": "large-v2",
"device": "cuda",
"compute_type": "float16",
"download_root": "/home/user/.config/whisper-models",
"language": "ja"
}
trans_opts = {
"temperatures": [
0.0,
0.2,
0.4,
0.6000000000000001,
0.8,
1.0
],
"best_of": 5,
"beam_size": 5,
"patience": 2,
"initial_prompt": None,
"condition_on_previous_text": True,
"compression_ratio_threshold": 2.4,
"log_prob_threshold": -1.0,
"no_speech_threshold": 0.6,
"word_timestamps": False,
"prepend_punctuations": "\"'“¿([{-",
"append_punctuations": "\"'.。,,!!??::”)]}、",
"max_new_tokens": None,
"clip_timestamps": None,
"hallucination_silence_threshold": None
}
filename = '/mnt/media/test.mkv';
model = whisperx.load_model(**model_opts, asr_options=trans_opts)
audio = whisperx.load_audio(filename)
results = model.transcribe(audio, batch_size=16)
device = 'cuda'
# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(
language_code=results["language"],
device=device,
)
results = whisperx.align(results["segments"], model_a, metadata, audio, device, return_char_alignments=False)
logging.debug(json.dumps(results, indent=2, ensure_ascii=False))
leads to
/home/user/test/.venv/lib/python3.11/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.0.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.1+cu121. Bad things might happen unless you revert torch to 1.x.
Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-japanese were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-japanese and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/user/test/test.py", line 54, in <module>
results = whisperx.align(results["segments"], model_a, metadata, audio, device, return_char_alignments=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/test/.venv/lib/python3.11/site-packages/whisperx/alignment.py", line 232, in align
inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate'
I am unable to get it working at all. testing just faster-whisper works ok it seems there is problem with the Wav2Vec model.
I have the same issue, does anyone get around this? It might be caused by some breaking changes in transformers, I'll try downgrading transformers.
Finally I solved this error rewriting alignment.py
like this:
- inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+ inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt").to(device)
Finally I solved this error rewriting
alignment.py
like this:- inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device) + inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt").to(device)
Thanks, i've made small patch file that make it backwards compatible
--- .venv/lib/python3.11/site-packages/whisperx/alignment.py 2024-03-03 17:22:05.042130573 +0300
+++ .venv/lib/python3.11/site-packages/whisperx/alignment.py 2024-03-03 17:25:20.760972944 +0300
@@ -229,7 +229,13 @@
emissions, _ = model(waveform_segment.to(device), lengths=lengths)
elif model_type == "huggingface":
if preprocess:
- inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+ sample_rate = None
+ if 'sampling_rate' in processor.__dict__:
+ sample_rate = processor.sampling_rate
+ if 'feature_extractor' in processor.__dict__ and 'sampling_rate' in processor.feature_extractor.__dict__:
+ sample_rate = processor.feature_extractor.sampling_rate
+
+ inputs = processor(waveform_segment.squeeze(), sampling_rate=sample_rate, return_tensors="pt").to(device)
emissions = model(**inputs).logits
else:
emissions = model(waveform_segment.to(device)).logits
How did you solve it, I tried to find the code you mentioned its not exist.
I also don't see the code referenced above.
@alfahadgm @melanie-rosenberg, i am unsure why but this fix intended for v3.1.2, which it seems has been removed from the repo for some reason.
Maybe @m-bain can shed some light on why
Thank you @arabcoders -- applying the patch worked while using v3.1.2.
FYI @alfahadgm running this also worked:
pip install -U git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560
Here's some info about the PyPI release vs this repo in case anyone else is confused like I was: It seems like the PyPI releases are created by someone other than the maintainer of this repo according to https://github.com/m-bain/whisperX/issues/700#issuecomment-1957790696. The above patch works on top of this PR https://github.com/m-bain/whisperX/pull/625.
@HHousen Any chance you could submit a PR to get that change merged?