openlrc Improve error handling: No active speech found in audio

use openlrc version: 1.5.2

When try to transcribe a video that have no human voice, will get exception RuntimeError: stack expects a non-empty TensorList. I found the following text in log:

 [2024-09-19 22:48:52] INFO     [Producer_0] Audio length: /home/user00/gitspace/video_tools/.data/no-speech/preprocessed/no-speech_preprocessed.wav: 00:25:14,243
No active speech found in audio

Is it possible for openlrc to handle this situation and end the transcription task early? Generating an empty subtitle file and return its path as usual, which may be a reasonable way to deal with it.

2024-09-19 22:48:16.532 | INFO     | video_tools.transcribe.base_transcriber:preview:93 - preview transcribe task:
TranscribeMetadata(
│   params=TranscribeParams(model='tiny', device='cpu', compute_type='int8'),
│   audios=[
│   │   AudioMetadata(path=PosixPath('/home/user00/gitspace/video_tools/.data/no-speech/no-speech.mp4'), hash='6e8b9718e3f6c6f60be6c25f766e3da885995f557d541989a341896feff6d505', subtitle=None, error=None)
│   ]
)
2024-09-19 22:48:16.622 | INFO     | video_tools.transcribe.base_transcriber:preview:95 - total audios num: 1
Do you want to continue? [y/N]: y
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint .venv/lib/python3.11/site-packages/faster_whisper/assets/pyannote_vad_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.3.2. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.
 [2024-09-19 22:48:18] INFO     [MainThread] File /home/user00/gitspace/video_tools/.data/no-speech/no-speech.mp4: Audio sample rate: 44100
 [2024-09-19 22:48:19] INFO     [MainThread] Loudness normalizing...
 [2024-09-19 22:48:19] INFO     [MainThread] Normalizing file no-speech.wav (1 of 1)
 [2024-09-19 22:48:19] INFO     [MainThread] Running first pass loudnorm filter for stream 0
 [2024-09-19 22:48:48] INFO     [MainThread] Running second pass for /home/user00/gitspace/video_tools/.data/no-speech/no-speech.wav
 [2024-09-19 22:48:52] INFO     [MainThread] Normalized file written to /home/user00/gitspace/video_tools/.data/no-speech/preprocessed/no-speech_ln.wav
 [2024-09-19 22:48:52] INFO     [MainThread] Preprocessed audio saved to /home/user00/gitspace/video_tools/.data/no-speech/preprocessed/no-speech_preprocessed.wav
 [2024-09-19 22:48:52] INFO     [MainThread] Working on 1 audio files: [PosixPath('/home/user00/gitspace/video_tools/.data/no-speech/preprocessed/no-speech_preprocessed.wav')]
 [2024-09-19 22:48:52] INFO     [MainThread] Start Transcription (Producer) and Translation (Consumer) process
 [2024-09-19 22:48:52] INFO     [Producer_0] Start Transcription process
 [2024-09-19 22:48:52] INFO     [Producer_0] Audio length: /home/user00/gitspace/video_tools/.data/no-speech/preprocessed/no-speech_preprocessed.wav: 00:25:14,243
No active speech found in audio
 [2024-09-19 22:49:24] INFO     [Producer_0] Detected language: en (0.58) in first 30s of audio...
 [2024-09-19 22:49:24] INFO     [Producer_0] Transcription process Elapsed: 31.53s
 [2024-09-19 22:49:24] INFO     [MainThread] Transcription (Producer) and Translation (Consumer) process Elapsed: 31.53s
Traceback (most recent call last):
  File "/home/user00/gitspace/video_tools/video_tools/main.py", line 6, in <module>
    fire.Fire(OpenLRCTranscriber)
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/video_tools/transcribe/base_transcriber.py", line 125, in run
    return self._transcribe()
           ^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/video_tools/transcribe/openlrc_transcriber.py", line 13, in _transcribe
    return self._lrcer.run(self._audios, skip_trans=True, clear_temp=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/openlrc/openlrc.py", line 370, in run
    producer.result()
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/openlrc/openlrc.py", line 122, in produce_transcriptions
    segments, info = self.transcriber.transcribe(audio_path, language=src_lang)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/openlrc/transcribe.py", line 81, in transcribe
    seg_gen, info = self.whisper_model.transcribe(str(audio_path), language=language, **self.asr_options)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 523, in transcribe
    features = torch.stack(
               ^^^^^^^^^^^^
RuntimeError: stack expects a non-empty TensorList

Sep 19 '24 14:09 MaleicAcid

There is an existing PR for Faster-Whisper to implement early stopping for non-voice audio, which can be found at https://github.com/SYSTRAN/faster-whisper/pull/1014. Until it's merged, there seems to be no straightforward solution to stop it early without adding an extra VAD, which is computationally intensive and unnecessary for most of users.

As a workaround, you could try implementing voice detection using the pyannote on your local machine before sending the audio to openlrc.

Sep 23 '24 08:09 zh-plus

It should be fixed with the latest version of Faster-Whisper in v1.6.0. Please reopen it if the issue persists.

Dec 09 '24 11:12 zh-plus

Thank you for following this issue and releasing version 1.6.0.

In fast-whisper==1.1.0(with openlrc==1.6.0), VadOptions has member named onset(not threshold)

# https://github.com/SYSTRAN/faster-whisper/blob/v1.1.0/faster_whisper/vad.py#L37
class VadOptions:
    onset: float = 0.5

Thus cause following error:

  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/openlrc/openlrc.py", line 122, in produce_transcriptions
    segments, info = self.transcriber.transcribe(audio_path, language=src_lang)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/openlrc/transcribe.py", line 86, in transcribe
    seg_gen, info = self.whisper_model.transcribe(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user00/gitspace/video_tools/.venv/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 404, in transcribe
    vad_parameters = VadOptions(
                     ^^^^^^^^^^^
TypeError: VadOptions.__init__() got an unexpected keyword argument 'threshold'

Specifying dependencies as the following commit versions can solve this problem.

faster-whisper = { url = "https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz" }

Dec 09 '24 15:12 MaleicAcid

Thanks! I've updated this dependency.

Dec 09 '24 15:12 zh-plus