noScribe icon indicating copy to clipboard operation
noScribe copied to clipboard

noScribe 0.6 crashing after transcription is finshed (Windows + CUDA, VTT/TXT-output, long files)

Open kaixxx opened this issue 9 months ago • 23 comments

Two users reported a strange issue: under specific circumstances, noScribe crashes after the transcript is finished. As it seems, this happens only...

  • on Windows using CUDA,
  • when creating txt or vtt output, not the default html,
  • with long audio files

@Lod3 created a nice video demonstrating the issue:

https://github.com/user-attachments/assets/58c8e4e9-9113-4667-bbc5-f47de6489a38

kaixxx avatar Mar 06 '25 10:03 kaixxx

For now, a simple workaround should be:

  • Use the default html output, and
  • when the file opens in the noScribe editor, use File > Save As to convert it to txt or vtt format (this is a new feature in noScribe 0.6).

The issue is a bit mysterious, since noScribe should be idle when it crashes. I will do some further investigations.

kaixxx avatar Mar 06 '25 10:03 kaixxx

However, one question regarding the log file: The transcribed text seems to end mid-sentence. Are you sure this is where the audio ends? Or did you cut out some text for posting it here?

It ended because the recording technician just ended the sound input from the mic. The source is my recording of a zoom session of a PID workhop/presentation/webinar. So I am sure the transcription is complete and noscribe finished.

This is the source material: https://xob.quickconnect.to/d/s/12QXiNoR84qC0MkcazXniJMSDhIyj91s/m1BNcZGNsVjWN75W4kdgjaGT6KhljeXi-67qgXWaRGgw

The password is the name of the application we are all using no caps.

Lod3 avatar Mar 06 '25 11:03 Lod3

It ended because the recording technician just ended the sound input from the mic.

OK, fine, so it is exactly the same issue we've seen before. The transcription is completely finished and then noScribe crashes. Very strange. What happens if you transcribe only the last few minutes, setting the Start value accordingly?

kaixxx avatar Mar 06 '25 11:03 kaixxx

Hi,

Same issue I'm afraid.

https://github.com/user-attachments/assets/78fd3b52-deb5-458c-9095-140c2bc0a279

Lod3 avatar Mar 06 '25 12:03 Lod3

Same issue I'm afraid.

Ok, thanks. This suggests that there is something about this particular audio recording that triggers the issue. Other recordings work fine for you? I have downloaded and tested the audio on my machine (non cuda), but I cannot reproduce the issue.

Something I noticed, however: For me, the transcription ends with "And we know that we are... ..." In your video, it shows some hallucinated text in the end which is also not in the audio ("fire and pick up our window" etc.). I'm not sure if this is really the root cause for the crash, since the transcription finishes just fine. But it is quite strange indeed. I have an idea how to prevent this hallucination from happening. Would you be able to run noScribe from source? This would be a great help in troubleshooting.

kaixxx avatar Mar 06 '25 14:03 kaixxx

Hmm ok I will try to run it from source. I might need some help though. I will try again with another file before trying to build it form source

Lod3 avatar Mar 06 '25 14:03 Lod3

I used another file and it finishes the transcription and does not crash. However the source file is about 25% short compared to the other file. I will try another long video to transcribe.

I will try and build NoScribe from source now and try the other file again. Are there any Build instructions? Thanks again for the support.

Lod3 avatar Mar 06 '25 15:03 Lod3

No need to "build" noScribe. It's a Python script that can be run directly from source. I've asked GPT-4 to write some nice instructions for you (I've checked and corrected them, of course):

Step 1: Install Git

  1. Download and Install Git
    • Visit git-scm.com and download the latest version of Git for Windows.
    • Run the installer and follow the instructions. Generally, the default options work fine.
    • Ensure Git is added to your system's PATH during installation.

Step 2: Install Python 3.12

  1. Download Python 3.12

  2. Install Python 3.12

    • Run the installer.
    • Make sure to check "Add Python 3.12 to PATH" before proceeding with the installation.
    • Follow the on-screen prompts to complete the installation.

Step 3: Set Up a Virtual Environment

  1. Open Command Prompt

    • Press Win + R, type cmd, and hit Enter.
  2. Navigate to Your Preferred Directory

    • Change directories to where you want to download the app. Use the cd command:
      cd C:\path\to\your\desired\folder
      
  3. Clone the "noScribe" Repository

    • Enter the command:
      git clone https://github.com/kaixxx/noScribe.git
      
    • Change into the cloned "noScribe" directory:
      cd noScribe
      
  4. Create a Virtual Environment

    • Enter the following command to create a virtual environment named "noscribe":
      python -m venv noscribe
      
  5. Activate the Virtual Environment

    • Execute the following command:
      noscribe\Scripts\activate
      
    • The prompt should now start with (noscribe), indicating the virtual environment is activated.

Step 4: Install the Required Packages

  1. Install Requirements
    • Ensure you are in the "noScribe" directory and the virtual environment is activated.
    • Run the command:
      pip install -r environments\requirements_win_cuda.txt
      

Step 5: Download whisper models for transcription

In the folder models, you will find instructions on how to download the AI models for the transcription. They are too large to be hosted directly on GitHub. You can also copy the model files from your existing noScribe installation.

Step 6: Run the App

  1. Execute noScribe.py
    • Launch the application by typing:
      python noScribe.py
      

Troubleshooting

  • If any command returns "command not found" or "not recognized", verify that Git or Python is installed correctly and added to your system PATH.
  • Ensure you are in the right directory (noScribe) when executing commands.
  • Check that your virtual environment is activated, as indicated by (noscribe) in your command prompt.
  • Note that the noScribe editor is not going to work, which is not a big deal since you are working with text files anyway.

kaixxx avatar Mar 06 '25 15:03 kaixxx

C:\Users\lode_\scripts\noScribe (main -> origin)
(noscribe) λ pip install -r environments\requirements_win_cuda.txt
Collecting torchaudio (from -r environments\requirements_win_cuda.txt (line 4))
  Downloading torchaudio-2.6.0-cp313-cp313-win_amd64.whl.metadata (6.7 kB)
Collecting AdvancedHTMLParser (from -r environments\requirements_win_cuda.txt (line 8))
  Downloading AdvancedHTMLParser-9.0.2.tar.gz (315 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting appdirs (from -r environments\requirements_win_cuda.txt (line 9))
  Using cached appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting cpufeature (from -r environments\requirements_win_cuda.txt (line 10))
  Downloading cpufeature-0.2.1.tar.gz (14 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting customtkinter (from -r environments\requirements_win_cuda.txt (line 11))
  Downloading customtkinter-5.2.2-py3-none-any.whl.metadata (677 bytes)
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
  Downloading faster_whisper-1.1.1-py3-none-any.whl.metadata (16 kB)
Collecting Pillow (from -r environments\requirements_win_cuda.txt (line 13))
  Downloading pillow-11.1.0-cp313-cp313-win_amd64.whl.metadata (9.3 kB)
Collecting pyannote.audio>=3.3.2 (from -r environments\requirements_win_cuda.txt (line 14))
  Downloading pyannote.audio-3.3.2-py2.py3-none-any.whl.metadata (11 kB)
Collecting pyinstaller<=6.4.0 (from -r environments\requirements_win_cuda.txt (line 15))
  Downloading pyinstaller-4.5.1-py3-none-win_amd64.whl.metadata (7.1 kB)
Collecting python-i18n (from -r environments\requirements_win_cuda.txt (line 18))
  Downloading python_i18n-0.3.9-py3-none-any.whl.metadata (5.5 kB)
Collecting PyYAML (from -r environments\requirements_win_cuda.txt (line 19))
  Downloading PyYAML-6.0.2-cp313-cp313-win_amd64.whl.metadata (2.1 kB)
Collecting torch==2.6.0 (from torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Downloading torch-2.6.0-cp313-cp313-win_amd64.whl.metadata (28 kB)
Collecting filelock (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Downloading filelock-3.17.0-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.10.0 (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting networkx (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Downloading networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting fsspec (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Downloading fsspec-2025.2.0-py3-none-any.whl.metadata (11 kB)
Collecting setuptools (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Using cached setuptools-75.8.2-py3-none-any.whl.metadata (6.7 kB)
Collecting sympy==1.13.1 (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy==1.13.1->torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting QueryableList (from AdvancedHTMLParser->-r environments\requirements_win_cuda.txt (line 8))
  Downloading QueryableList-3.1.0.tar.gz (55 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting darkdetect (from customtkinter->-r environments\requirements_win_cuda.txt (line 11))
  Using cached darkdetect-0.8.0-py3-none-any.whl.metadata (3.6 kB)
Collecting packaging (from customtkinter->-r environments\requirements_win_cuda.txt (line 11))
  Using cached packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
INFO: pip is looking at multiple versions of faster-whisper to determine which version is compatible with other requirements. This could take a while.
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
  Downloading faster_whisper-1.1.0-py3-none-any.whl.metadata (16 kB)
  Downloading faster_whisper-1.0.3-py3-none-any.whl.metadata (15 kB)
Collecting av<13,>=11.0 (from faster-whisper->-r environments\requirements_win_cuda.txt (line 12))
  Downloading av-12.3.0.tar.gz (3.8 MB)
     ---------------------------------------- 3.8/3.8 MB 12.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
  Downloading faster_whisper-1.0.2-py3-none-any.whl.metadata (15 kB)
  Downloading faster_whisper-1.0.1-py3-none-any.whl.metadata (14 kB)
Collecting av==11.* (from faster-whisper->-r environments\requirements_win_cuda.txt (line 12))
  Downloading av-11.0.0.tar.gz (3.7 MB)
     ---------------------------------------- 3.7/3.7 MB 12.3 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
  Downloading faster_whisper-1.0.0-py3-none-any.whl.metadata (14 kB)
  Downloading faster_whisper-0.10.1-py3-none-any.whl.metadata (11 kB)
Collecting av==10.* (from faster-whisper->-r environments\requirements_win_cuda.txt (line 12))
  Downloading av-10.0.0.tar.gz (2.4 MB)
     ---------------------------------------- 2.4/2.4 MB 12.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [78 lines of output]
      Compiling av\buffer.pyx because it changed.
      [1/1] Cythonizing av\buffer.pyx
      Compiling av\bytesource.pyx because it changed.
      [1/1] Cythonizing av\bytesource.pyx
      Compiling av\descriptor.pyx because it changed.
      [1/1] Cythonizing av\descriptor.pyx
      Compiling av\dictionary.pyx because it changed.
      [1/1] Cythonizing av\dictionary.pyx
      Compiling av\enum.pyx because it changed.
      [1/1] Cythonizing av\enum.pyx
      Compiling av\error.pyx because it changed.
      [1/1] Cythonizing av\error.pyx
      Compiling av\format.pyx because it changed.
      [1/1] Cythonizing av\format.pyx
      Compiling av\frame.pyx because it changed.
      [1/1] Cythonizing av\frame.pyx
      performance hint: av\logging.pyx:232:5: Exception check on 'log_callback' will always require the GIL to be acquired.
      Possible solutions:
          1. Declare 'log_callback' as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
          2. Use an 'int' return type on 'log_callback' to allow an error code to be returned.

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      cdef const char *log_context_name(void *ptr) nogil:
          cdef log_context *obj = <log_context*>ptr
          return obj.name

      cdef lib.AVClass log_class
      log_class.item_name = log_context_name
                            ^
      ------------------------------------------------------------

      av\logging.pyx:216:22: Cannot assign type 'const char *(void *) except? NULL nogil' to 'const char *(*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to the type of 'log_context_name'.

      Error compiling Cython file:
      ------------------------------------------------------------
      ...

      # Start the magic!
      # We allow the user to fully disable the logging system as it will not play
      # nicely with subinterpreters due to FFmpeg-created threads.
      if os.environ.get('PYAV_LOGGING') != 'off':
          lib.av_log_set_callback(log_callback)
                                  ^
      ------------------------------------------------------------

      av\logging.pyx:351:28: Cannot assign type 'void (void *, int, const char *, va_list) except * nogil' to 'av_log_callback' (alias of 'void (*)(void *, int, const char *, va_list) noexcept nogil'). Exception values are incompatible. Suggest adding 'noexcept' to the type of 'log_callback'.
      Compiling av\logging.pyx because it changed.
      [1/1] Cythonizing av\logging.pyx
      Traceback (most recent call last):
        File "C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
          ~~~~^^
        File "C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 304, in _get_build_requires
          self.run_setup()
          ~~~~~~~~~~~~~~^^
        File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 522, in run_setup
          super().run_setup(setup_script=setup_script)
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in run_setup
          exec(code, locals())
          ~~~~^^^^^^^^^^^^^^^^
        File "<string>", line 157, in <module>
        File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\Cython\Build\Dependencies.py", line 1154, in cythonize
          cythonize_one(*args)
          ~~~~~~~~~~~~~^^^^^^^
        File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\Cython\Build\Dependencies.py", line 1321, in cythonize_one
          raise CompileError(None, pyx_file)
      Cython.Compiler.Errors.CompileError: av\logging.pyx
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

C:\Users\lode_\scripts\noScribe (main -> origin)

So a lot of modules did not get installed no issue I tried starting noscribe and everytime it missed a module I installed it manually until, ctranslate2

C:\Users\lode_\scripts\noScribe (main -> origin)
(noscribe) λ python noScribe.py
C:\Users\lode_\scripts\noScribe\noScribe.py:266: SyntaxWarning: invalid escape sequence '\['
  timestamp_re = re.compile('\[\d\d:\d\d:\d\d.\d\d\d --> \d\d:\d\d:\d\d.\d\d\d\]')
C:\Users\lode_\scripts\noScribe\noScribe.py:380: SyntaxWarning: invalid escape sequence '\d'
  self.valid = re.compile('^\d{0,2}(:\d{0,2}(:\d{0,2})?)?$', re.I)
Traceback (most recent call last):
  File "C:\Users\lode_\scripts\noScribe\noScribe.py", line 41, in <module>
    from ctranslate2 import get_cuda_device_count
ModuleNotFoundError: No module named 'ctranslate2'

I tried installing ctranslate2 with

pip install ctranslate2 according to: https://opennmt.net/CTranslate2/installation.html Which gave me:

(noscribe) λ pip install ctranslate2
ERROR: Could not find a version that satisfies the requirement ctranslate2 (from versions: none)
ERROR: No matching distribution found for ctranslate2

Any ideas?

Lod3 avatar Mar 07 '25 15:03 Lod3

Oh of course -_- https://github.com/OpenNMT/CTranslate2/issues/1817 -_-

C:\Users\lode_\scripts\noScribe (main -> origin)
(noscribe) λ python --version
Python 3.13.0

Lod3 avatar Mar 07 '25 15:03 Lod3

Yes, I would generally avoid python 3.13 for now since many libraries are not yet compatible.

kaixxx avatar Mar 07 '25 16:03 kaixxx

Ok ok we got a little further, now I get the following error. It might be the last?

(noscribe) λ python noScribe.py
C:\Users\lode_\scripts\noScribe\noScribe.py:266: SyntaxWarning: invalid escape sequence '\['
  timestamp_re = re.compile('\[\d\d:\d\d:\d\d.\d\d\d --> \d\d:\d\d:\d\d.\d\d\d\]')
C:\Users\lode_\scripts\noScribe\noScribe.py:380: SyntaxWarning: invalid escape sequence '\d'
  self.valid = re.compile('^\d{0,2}(:\d{0,2}(:\d{0,2})?)?$', re.I)
Traceback (most recent call last):
  File "C:\Users\lode_\scripts\noScribe\noScribe.py", line 227, in <module>
    from i18n import t
ImportError: cannot import name 't' from 'i18n' (C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\i18n\__init__.py)

Lod3 avatar Mar 08 '25 15:03 Lod3

@Lod3: Sorry, I've missed your last message.

  • You can ignore the warnings about the "invalid escape sequence"
  • The error regarding i18n (used for localization) is strange. Try pip install python-i18n --upgrade, you may have another module with the same name or an older version installed.

If you have noScribe running, please test at first if the error is still there. After that, you can go to line 1431 in the source and change it to: temperature=0.0, (make sure to also remove the '#' in front and include the ',' at the end). In context, it should look like this:

                    segments, info = model.transcribe(
                        self.tmp_audio_file, # audio, 
                        language=whisper_lang,
                        multilingual=multilingual, 
                        beam_size=5, 
                        temperature=0.0, 
                        word_timestamps=True, 
                        #initial_prompt=self.prompt,
                        hotwords=self.prompt, 
                        vad_filter=True,
                        vad_parameters=vad_parameters,
                        # length_penalty=0.5
                    )

The "temperature" parameter controls how hard whisper tries to make sense of passages which are difficult to understand. Setting this to a low value can reduce hallucination. We often see such hallucinations at the end of the recording, when whisper is trying (too) hard to make sense of the very last fragment of audio that has no real information in it anymore. I'm curious to see if this parameter influences the crashes that you were experiencing.

kaixxx avatar Mar 12 '25 11:03 kaixxx

You are experienceing the samie issues i have with my own homebrew .py script for audio transcription and spreaker recognition. It crashes python the very same with long audio files.

+ System 

  - Provider 

   [ Name]  Application Error 
 
  - EventID 1000 

   [ Qualifiers]  0 
 
   Version 0 
 
   Level 2 
 
   Task 100 
 
   Opcode 0 
 
   Keywords 0x80000000000000 
 
  - TimeCreated 

   [ SystemTime]  2025-03-13T13:46:46.4939905Z 
 
   EventRecordID 136237 
 
   Correlation 
 
  - Execution 

   [ ProcessID]  0 
   [ ThreadID]  0 
 
   Channel Application 
 
   Computer DESKTOP-00M8NI3 
 
   Security 
 

+ EventData 

   noScribe.exe 
   0.0.0.0 
   67ad01d1 
   KERNELBASE.dll 
   10.0.19041.5486 
   09ce69c7 
   e06d7363 
   000000000003b699 
   3c34 
   01db941bb8ebfd22 
   G:\SD\noScribe\noScribe.exe 
   C:\Windows\System32\KERNELBASE.dll 
   2c583055-807c-4201-9ec7-367b0e9ef285 
    
    + System 

  - Provider 

   [ Name]  Application Error 
 
  - EventID 1000 

   [ Qualifiers]  0 
 
   Version 0 
 
   Level 2 
 
   Task 100 
 
   Opcode 0 
 
   Keywords 0x80000000000000 
 
  - TimeCreated 

   [ SystemTime]  2025-03-13T13:46:52.0847497Z 
 
   EventRecordID 136238 
 
   Correlation 
 
  - Execution 

   [ ProcessID]  0 
   [ ThreadID]  0 
 
   Channel Application 
 
   Computer DESKTOP-00M8NI3 
 
   Security 
 

+ EventData 

   noScribe.exe 
   0.0.0.0 
   67ad01d1 
   ucrtbase.dll 
   10.0.19041.3636 
   81cf5d89 
   c0000409 
   000000000007286e 
   3c34 
   01db941bb8ebfd22 
   G:\SD\noScribe\noScribe.exe 
   C:\Windows\System32\ucrtbase.dll 
   f3b6d1cd-3010-481c-8ce8-1f49e662c878 
    
    

Mangaclub avatar Mar 13 '25 13:03 Mangaclub

@Mangaclub: Is your script also using whisper large-v3-turbo together with cuda? Could you share your script, or at least the part dealing with whisper?

kaixxx avatar Mar 13 '25 16:03 kaixxx

i'll see what i can do, its closed source right now as its for a compny project. But i honestly doubt its whisper (happens with all of them). IT crashes on diarisation / pyannote workflow when its done and sends the speakers back. We both crash with a kernalbase.dll crash wich is not trivial but imo sounds more like theres a buffer overflow or somethign directly inside pyannote or surrounding components. I need to debug this as well but i wouldnt be surprised changing pyannotes verison back downwards would fix things... maybe...

Mangaclub avatar Mar 14 '25 08:03 Mangaclub

IT crashes on diarisation / pyannote workflow when its done and sends the speakers back.

In this case, I don't think it is the same error, since the crash described here is happening much later in the process.

kaixxx avatar Mar 14 '25 10:03 kaixxx

@Lod3: Have you got it running from source? I have another idea what to test, but no machine with NVIDIA graphics at hand.

kaixxx avatar Mar 19 '25 13:03 kaixxx

I had to order a new PSU. My current one caused hard power cuts whenever I tried to run a transcode or game. It should arrive soon this week or beginning next week.

Lod3 avatar Mar 19 '25 13:03 Lod3

I had to order a new PSU

Ah, OK. Good luck with that :)

kaixxx avatar Mar 19 '25 14:03 kaixxx

Little update, i think we have the same Problem, it turnso ut it was the way i caleld fastwhisper I fixed it by multiprocessing

Since i am not allowed to share sourcode here a simple overview: Mayyybe it helps but its like over simplified.

Initialisation:

# Creating a Whisper model instance
model = WhisperModel(
    model_size_or_path=whisper_model_name,  # e.g., "large-v2"
    device=device,                          # "cuda" or "cpu"
    compute_type=compute_type,              # "float16" or "float32"
    download_root=f"models/whisper-{whisper_model_name}"
)

Multiprocessing Implementation: Transcription runs in a separate process to prevent UI freezing / crashing Results are passed back via a multiprocessing Queue

# In start_transcription method:
self.result_queue = multiprocessing.Queue()
self.transcription_process = multiprocessing.Process(
    target=run_transcription_process, 
    args=(self.audio_file, self.compute_type_var.get(), ...)
)
self.transcription_process.start()

Result Handling: Main thread periodically checks for results using wait_for_transcription_result When results arrive, they're displayed and optionally passed to diarization

Speaker Diarization Process Speaker diarization identifies who spoke when:

Model Loading:

# Initialize SpeechBrain speaker recognition model
self.speaker_recognition_model = SpeakerRecognition.from_hparams(
    source=speaker_reco_model_name,  # e.g., "speechbrain/spkrec-ecapa-voxceleb"
    savedir=f"models/speechbrain-{speaker_reco_model_name.split('/')[-1]}",
    run_opts={"device": device}
)

Voice Embedding Generation:

For each transcription segment, extract audio and create speaker embeddings Embeddings are vector representations of voice characteristics Clustering for Speaker Identification:

# Group similar voices using clustering
clustering = AgglomerativeClustering(n_clusters=n_clusters, linkage="ward").fit(embeddings)
labels = clustering.labels_

Mangaclub avatar Mar 19 '25 23:03 Mangaclub

@Mangaclub Thank you for showing more of your implementation, this helps. However, I don't fully understand where exactly in your process the crash happened and what you did to fix it (something with multiprocessing, but how was it implemented before?)...

kaixxx avatar Mar 20 '25 14:03 kaixxx

Okay since i cannot really share the whole code heres a AI summary cus my brain is mushed from a long night ;)

So herre the AI summary: The Original Script (Before Improvement):

In the original script, the relevant functions for transcription are: start_transcription(self): This function is called when the user clicks "Start Transcription". It initiates a new thread to execute the actual transcription work in the run_transcription(self) function.

def start_transcription(self):
    if not self.audio_file:
        messagebox.showwarning("Warning", "Please select an audio file first.")
        return
    self.status_var.set("Starting transcription...")
    threading.Thread(target=self.run_transcription, daemon=True).start()

run_transcription(self): This function contains the code to load the Whisper model and perform the transcription.

def run_transcription(self):
    try:
        self.status_var.set("Loading Whisper model...")
        self.root.update_idletasks()
        compute_type = self.compute_type_var.get()
        whisper_model_name = self.whisper_model_var.get()
        device = self.device_var.get()
        model = WhisperModel(
            model_size_or_path=whisper_model_name,
            device=device,
            compute_type=compute_type,
            download_root=f"models/whisper-{whisper_model_name}"
        )
        language = None if self.language_var.get() == "auto" else self.language_var.get()
        self.status_var.set("Transcribing audio...")
        self.root.update_idletasks()
        segments, info = model.transcribe(
            self.audio_file,
            language=language,
            vad_filter=True,
            word_timestamps=True
        )
        self.transcription = {
            "segments": [{
                "start": seg.start,
                "end": seg.end,
                "text": seg.text.strip(),
                "words": [{"word": w.word, "start": w.start, "end": w.end, "probability": w.probability} for w in seg.words]
            } for seg in segments],
            "language": info.language,
            "language_probability": info.language_probability
        }
        self.display_transcription()
        if self.diarize_var.get():
            self.status_var.set("Running speaker diarization...")
            self.root.update_idletasks()
            self.run_speechbrain_diarization()
        else:
            self.status_var.set("Transcription completed.")
    except Exception as e:
        self.status_var.set(f"Error: {str(e)}")
        messagebox.showerror("Error", f"Transcription failed: {str(e)}")

The Original Way of Calling and Handling Whisper:

  • The transcription was performed within the run_transcription function.
  • This function was executed in a separate thread using threading.Thread.

The Problem with the Original Approach (Using Threads):

While the idea of using a separate thread for the computationally intensive transcription is correct to avoid completely freezing the GUI, it is often not sufficient to prevent significant UI freezing for CPU-bound tasks like Whisper transcription in Python. The reason for this is the Global Interpreter Lock (GIL) in CPython (the standard Python implementation). The GIL allows only one thread to execute Python bytecode at a time. For I/O-bound tasks, the GIL is released, allowing other threads to work. However, Whisper transcription is heavily CPU-bound. Even though a separate thread was used, the Whisper calculations could still consume so much of the CPU time (due to the GIL) that the main GUI thread became unresponsive, leading to the perceived "crash" or freezing. The root.update_idletasks() calls attempted to update the GUI during the process but were not enough in this case.

The New Way of Calling and Handling Whisper (in the Improved Code):

In the improved code, the transcription is offloaded to a separate process instead of a thread:

  1. run_transcription_process(audio_file, compute_type, whisper_model_name, device, language, result_queue): This new function contains the transcription code and is used as the target for the new process.
def run_transcription_process(audio_file, compute_type, whisper_model_name, device, language, result_queue):
    from faster_whisper import WhisperModel
    import logging

    logging.basicConfig(level=logging.DEBUG)

    try:
        model = WhisperModel(
            model_size_or_path=whisper_model_name,
            device=device,
            compute_type=compute_type,
            download_root=f"models/whisper-{whisper_model_name}"
        )
        language_code = language if language != "auto" else None
        segments, info = model.transcribe(
            audio_file,
            language=language_code,
            vad_filter=True,
            word_timestamps=True
        )
        segments_list =
        for segment in segments:
            segments_list.append({
                "start": segment.start,
                "end": segment.end,
                "text": segment.text.strip(),
                "words": [{"word": w.word, "start": w.start, "end": w.end, "probability": w.probability} for w in segment.words]
            })
        info_dict = {"language": info.language, "language_probability": info.language_probability}
        results = {"segments": segments_list, "info": info_dict}
        result_queue.put(results)
    except Exception as e:
        logging.error(f"Error during transcription in process: {e}")
        result_queue.put(None)
  1. Using multiprocessing.Process: In the start_transcription function, a multiprocessing.Process is now created and started:
self.result_queue = multiprocessing.Queue()
self.transcription_process = multiprocessing.Process(target=run_transcription_process, args=(
    self.audio_file,
    self.compute_type_var.get(),
    self.whisper_model_var.get(),
    self.device_var.get(),
    self.language_var.get() if self.language_var.get() != "auto" else None,
    self.result_queue
))
self.transcription_process.start()
self.root.after(500, self.wait_for_transcription_result)
  1. Communication via multiprocessing.Queue: The transcription results are sent back to the main process via a multiprocessing.Queue (self.result_queue).

  2. Waiting for the Result in the Main Process: The wait_for_transcription_result function in the main process waits for the transcription process to finish and then retrieves the results from the queue.

Why the "Crash" (Freezing) Doesn't Happen Now:

  • Bypassing the GIL: Each multiprocessing process has its own Python interpreter and therefore its own GIL. This allows the CPU-intensive Whisper transcription calculations to run in a separate process without being limited by the GIL of the main process (which handles the GUI).
  • True Parallelism: Multiprocessing enables true parallel execution on systems with multiple cores or processors, which can potentially reduce the transcription time (compared to the threading solution, which often doesn't benefit from multiple cores due to the GIL).

In Summary:

The key difference is that the original script used threading, which is often insufficient to prevent GUI freezing /crashing for CPU-bound tasks due to the GIL. The improved code uses multiprocessing, which creates separate processes and thus bypasses the GIL, leading to a more responsive application because the computationally intensive Whisper transcription runs in the background without blocking the main GUI thread.

Mangaclub avatar Mar 20 '25 15:03 Mangaclub