speechlib
speechlib copied to clipboard
transcription in logs file is empty
Hi,
thank You for Your work but I am having issues.
Theres no error but after run your example I am getting an almost empty file in logs:
In the file theres only following string:
zach (206.8 : 206.8) :
In terminal theres no errors>
(speechlib39) piotr@Legion7:~/speechlib/examples$ python3 transcribe.py
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
obama_zach.wav is already in WAV format.
obama_zach.wav is already a mono audio file.
The file already has 16-bit samples.
config.yaml: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 292kB/s]pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.7M/17.7M [00:00<00:00, 19.4MB/s]config.yaml: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 318/318 [00:00<00:00, 36.2kB/s]Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.0+cu121. Bad things might happen unless you revert torch to 1.x.
running diarization...
diarization done. Time taken: 17 seconds.
running speaker recognition...
speaker recognition done. Time taken: 4 seconds.
running transcription...
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.26k/2.26k [00:00<00:00, 660kB/s]vocabulary.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 460k/460k [00:00<00:00, 1.02MB/s]tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20M/2.20M [00:00<00:00, 3.03MB/s]model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.53G/1.53G [00:58<00:00, 26.0MB/s]Cannot check for SPDIF
transcription done. Time taken: 140 seconds.
(speechlib39) piotr@Legion7:~/speechlib/examples$ ls
README.md audio_cache logs obama1.mp3 obama1.wav obama_zach.wav preprocess.py pretrained_models segments temp transcribe.py voices
(speechlib39) piotr@Legion7:~/speechlib/examples$ python3 transcribe.py
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
obama_zach.wav is already in WAV format.
obama_zach.wav is already a mono audio file.
The file already has 16-bit samples.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.0+cu121. Bad things might happen unless you revert torch to 1.x.
running diarization...
diarization done. Time taken: 14 seconds.
running speaker recognition...
speaker recognition done. Time taken: 4 seconds.
running transcription...
Cannot check for SPDIF
transcription done. Time taken: 82 seconds.
Content of the file:
I have python 3.9, clean conda env. Whisper works flawleslly
- did u ran the same example in this repo? if not then post the code.
- what is the model size you used?
- did you input paths to obama_zach file correctly?
- can you run this in normal python environment instead of conda and tell me if error persists
Ad 1. Yes, Ive run same example, whithout any changes. I use diarize.py
~/speechlib/examples$ python3 transcribe.py
obama_zach_143156_en.txt Ad 2. I use medium Ad 3. Yes - it process the file. It takes time - 79sec to be precisely Ad 4. Sure, Ill have to prepare clean WSL VM.
This can happen due to a number of reasons because of an insane try/except block in this function.
It literally says:
try:
trans = transcribe(file, language, modelSize, quantization)
# return -> [[start time, end time, transcript], [start time, end time, transcript], ..]
texts.append([segment[0], segment[1], trans])
except:
pass
I removed this via a monkeypatch and it revealed the actual issue:
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
This is a common issue for faster-whisper and is discussed here: https://github.com/SYSTRAN/faster-whisper/issues/42
There may be a different error in your case.
Im having the same problem and it could be solved partially with
https://github.com/NavodPeiris/speechlib/issues/37
In the meantime, i'll try to create a branch in my fork that doesn't use faster-whisper.
I am having an empty file at then end when I use sinhala language , I know in the codebase we are providing a different model for sinhala than normal whisper , Can you please help me with this