noScribe 0.6 crashing after transcription is finshed (Windows + CUDA, VTT/TXT-output, long files)
Two users reported a strange issue: under specific circumstances, noScribe crashes after the transcript is finished. As it seems, this happens only...
- on Windows using CUDA,
- when creating txt or vtt output, not the default html,
- with long audio files
@Lod3 created a nice video demonstrating the issue:
https://github.com/user-attachments/assets/58c8e4e9-9113-4667-bbc5-f47de6489a38
For now, a simple workaround should be:
- Use the default html output, and
- when the file opens in the noScribe editor, use
File > Save Asto convert it to txt or vtt format (this is a new feature in noScribe 0.6).
The issue is a bit mysterious, since noScribe should be idle when it crashes. I will do some further investigations.
However, one question regarding the log file: The transcribed text seems to end mid-sentence. Are you sure this is where the audio ends? Or did you cut out some text for posting it here?
It ended because the recording technician just ended the sound input from the mic. The source is my recording of a zoom session of a PID workhop/presentation/webinar. So I am sure the transcription is complete and noscribe finished.
This is the source material: https://xob.quickconnect.to/d/s/12QXiNoR84qC0MkcazXniJMSDhIyj91s/m1BNcZGNsVjWN75W4kdgjaGT6KhljeXi-67qgXWaRGgw
The password is the name of the application we are all using no caps.
It ended because the recording technician just ended the sound input from the mic.
OK, fine, so it is exactly the same issue we've seen before. The transcription is completely finished and then noScribe crashes. Very strange.
What happens if you transcribe only the last few minutes, setting the Start value accordingly?
Hi,
Same issue I'm afraid.
https://github.com/user-attachments/assets/78fd3b52-deb5-458c-9095-140c2bc0a279
Same issue I'm afraid.
Ok, thanks. This suggests that there is something about this particular audio recording that triggers the issue. Other recordings work fine for you? I have downloaded and tested the audio on my machine (non cuda), but I cannot reproduce the issue.
Something I noticed, however: For me, the transcription ends with "And we know that we are... ..." In your video, it shows some hallucinated text in the end which is also not in the audio ("fire and pick up our window" etc.). I'm not sure if this is really the root cause for the crash, since the transcription finishes just fine. But it is quite strange indeed. I have an idea how to prevent this hallucination from happening. Would you be able to run noScribe from source? This would be a great help in troubleshooting.
Hmm ok I will try to run it from source. I might need some help though. I will try again with another file before trying to build it form source
I used another file and it finishes the transcription and does not crash. However the source file is about 25% short compared to the other file. I will try another long video to transcribe.
I will try and build NoScribe from source now and try the other file again. Are there any Build instructions? Thanks again for the support.
No need to "build" noScribe. It's a Python script that can be run directly from source. I've asked GPT-4 to write some nice instructions for you (I've checked and corrected them, of course):
Step 1: Install Git
- Download and Install Git
- Visit git-scm.com and download the latest version of Git for Windows.
- Run the installer and follow the instructions. Generally, the default options work fine.
- Ensure Git is added to your system's PATH during installation.
Step 2: Install Python 3.12
-
Download Python 3.12
- Go to python.org/downloads.
- Click on "Download Python 3.12.x" for Windows.
-
Install Python 3.12
- Run the installer.
- Make sure to check "Add Python 3.12 to PATH" before proceeding with the installation.
- Follow the on-screen prompts to complete the installation.
Step 3: Set Up a Virtual Environment
-
Open Command Prompt
- Press
Win + R, typecmd, and hit Enter.
- Press
-
Navigate to Your Preferred Directory
- Change directories to where you want to download the app. Use the
cdcommand:cd C:\path\to\your\desired\folder
- Change directories to where you want to download the app. Use the
-
Clone the "noScribe" Repository
- Enter the command:
git clone https://github.com/kaixxx/noScribe.git - Change into the cloned "noScribe" directory:
cd noScribe
- Enter the command:
-
Create a Virtual Environment
- Enter the following command to create a virtual environment named "noscribe":
python -m venv noscribe
- Enter the following command to create a virtual environment named "noscribe":
-
Activate the Virtual Environment
- Execute the following command:
noscribe\Scripts\activate - The prompt should now start with
(noscribe), indicating the virtual environment is activated.
- Execute the following command:
Step 4: Install the Required Packages
- Install Requirements
- Ensure you are in the "noScribe" directory and the virtual environment is activated.
- Run the command:
pip install -r environments\requirements_win_cuda.txt
Step 5: Download whisper models for transcription
In the folder models, you will find instructions on how to download the AI models for the transcription. They are too large to be hosted directly on GitHub. You can also copy the model files from your existing noScribe installation.
Step 6: Run the App
- Execute
noScribe.py- Launch the application by typing:
python noScribe.py
- Launch the application by typing:
Troubleshooting
- If any command returns "command not found" or "not recognized", verify that Git or Python is installed correctly and added to your system PATH.
- Ensure you are in the right directory (
noScribe) when executing commands. - Check that your virtual environment is activated, as indicated by
(noscribe)in your command prompt. - Note that the noScribe editor is not going to work, which is not a big deal since you are working with text files anyway.
C:\Users\lode_\scripts\noScribe (main -> origin)
(noscribe) λ pip install -r environments\requirements_win_cuda.txt
Collecting torchaudio (from -r environments\requirements_win_cuda.txt (line 4))
Downloading torchaudio-2.6.0-cp313-cp313-win_amd64.whl.metadata (6.7 kB)
Collecting AdvancedHTMLParser (from -r environments\requirements_win_cuda.txt (line 8))
Downloading AdvancedHTMLParser-9.0.2.tar.gz (315 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting appdirs (from -r environments\requirements_win_cuda.txt (line 9))
Using cached appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting cpufeature (from -r environments\requirements_win_cuda.txt (line 10))
Downloading cpufeature-0.2.1.tar.gz (14 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting customtkinter (from -r environments\requirements_win_cuda.txt (line 11))
Downloading customtkinter-5.2.2-py3-none-any.whl.metadata (677 bytes)
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
Downloading faster_whisper-1.1.1-py3-none-any.whl.metadata (16 kB)
Collecting Pillow (from -r environments\requirements_win_cuda.txt (line 13))
Downloading pillow-11.1.0-cp313-cp313-win_amd64.whl.metadata (9.3 kB)
Collecting pyannote.audio>=3.3.2 (from -r environments\requirements_win_cuda.txt (line 14))
Downloading pyannote.audio-3.3.2-py2.py3-none-any.whl.metadata (11 kB)
Collecting pyinstaller<=6.4.0 (from -r environments\requirements_win_cuda.txt (line 15))
Downloading pyinstaller-4.5.1-py3-none-win_amd64.whl.metadata (7.1 kB)
Collecting python-i18n (from -r environments\requirements_win_cuda.txt (line 18))
Downloading python_i18n-0.3.9-py3-none-any.whl.metadata (5.5 kB)
Collecting PyYAML (from -r environments\requirements_win_cuda.txt (line 19))
Downloading PyYAML-6.0.2-cp313-cp313-win_amd64.whl.metadata (2.1 kB)
Collecting torch==2.6.0 (from torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Downloading torch-2.6.0-cp313-cp313-win_amd64.whl.metadata (28 kB)
Collecting filelock (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Downloading filelock-3.17.0-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.10.0 (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting networkx (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Downloading networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting fsspec (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Downloading fsspec-2025.2.0-py3-none-any.whl.metadata (11 kB)
Collecting setuptools (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Using cached setuptools-75.8.2-py3-none-any.whl.metadata (6.7 kB)
Collecting sympy==1.13.1 (from torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy==1.13.1->torch==2.6.0->torchaudio->-r environments\requirements_win_cuda.txt (line 4))
Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting QueryableList (from AdvancedHTMLParser->-r environments\requirements_win_cuda.txt (line 8))
Downloading QueryableList-3.1.0.tar.gz (55 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting darkdetect (from customtkinter->-r environments\requirements_win_cuda.txt (line 11))
Using cached darkdetect-0.8.0-py3-none-any.whl.metadata (3.6 kB)
Collecting packaging (from customtkinter->-r environments\requirements_win_cuda.txt (line 11))
Using cached packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
INFO: pip is looking at multiple versions of faster-whisper to determine which version is compatible with other requirements. This could take a while.
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
Downloading faster_whisper-1.1.0-py3-none-any.whl.metadata (16 kB)
Downloading faster_whisper-1.0.3-py3-none-any.whl.metadata (15 kB)
Collecting av<13,>=11.0 (from faster-whisper->-r environments\requirements_win_cuda.txt (line 12))
Downloading av-12.3.0.tar.gz (3.8 MB)
---------------------------------------- 3.8/3.8 MB 12.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
Downloading faster_whisper-1.0.2-py3-none-any.whl.metadata (15 kB)
Downloading faster_whisper-1.0.1-py3-none-any.whl.metadata (14 kB)
Collecting av==11.* (from faster-whisper->-r environments\requirements_win_cuda.txt (line 12))
Downloading av-11.0.0.tar.gz (3.7 MB)
---------------------------------------- 3.7/3.7 MB 12.3 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting faster-whisper (from -r environments\requirements_win_cuda.txt (line 12))
Downloading faster_whisper-1.0.0-py3-none-any.whl.metadata (14 kB)
Downloading faster_whisper-0.10.1-py3-none-any.whl.metadata (11 kB)
Collecting av==10.* (from faster-whisper->-r environments\requirements_win_cuda.txt (line 12))
Downloading av-10.0.0.tar.gz (2.4 MB)
---------------------------------------- 2.4/2.4 MB 12.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [78 lines of output]
Compiling av\buffer.pyx because it changed.
[1/1] Cythonizing av\buffer.pyx
Compiling av\bytesource.pyx because it changed.
[1/1] Cythonizing av\bytesource.pyx
Compiling av\descriptor.pyx because it changed.
[1/1] Cythonizing av\descriptor.pyx
Compiling av\dictionary.pyx because it changed.
[1/1] Cythonizing av\dictionary.pyx
Compiling av\enum.pyx because it changed.
[1/1] Cythonizing av\enum.pyx
Compiling av\error.pyx because it changed.
[1/1] Cythonizing av\error.pyx
Compiling av\format.pyx because it changed.
[1/1] Cythonizing av\format.pyx
Compiling av\frame.pyx because it changed.
[1/1] Cythonizing av\frame.pyx
performance hint: av\logging.pyx:232:5: Exception check on 'log_callback' will always require the GIL to be acquired.
Possible solutions:
1. Declare 'log_callback' as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
2. Use an 'int' return type on 'log_callback' to allow an error code to be returned.
Error compiling Cython file:
------------------------------------------------------------
...
cdef const char *log_context_name(void *ptr) nogil:
cdef log_context *obj = <log_context*>ptr
return obj.name
cdef lib.AVClass log_class
log_class.item_name = log_context_name
^
------------------------------------------------------------
av\logging.pyx:216:22: Cannot assign type 'const char *(void *) except? NULL nogil' to 'const char *(*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to the type of 'log_context_name'.
Error compiling Cython file:
------------------------------------------------------------
...
# Start the magic!
# We allow the user to fully disable the logging system as it will not play
# nicely with subinterpreters due to FFmpeg-created threads.
if os.environ.get('PYAV_LOGGING') != 'off':
lib.av_log_set_callback(log_callback)
^
------------------------------------------------------------
av\logging.pyx:351:28: Cannot assign type 'void (void *, int, const char *, va_list) except * nogil' to 'av_log_callback' (alias of 'void (*)(void *, int, const char *, va_list) noexcept nogil'). Exception values are incompatible. Suggest adding 'noexcept' to the type of 'log_callback'.
Compiling av\logging.pyx because it changed.
[1/1] Cythonizing av\logging.pyx
Traceback (most recent call last):
File "C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
main()
~~~~^^
File "C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 304, in _get_build_requires
self.run_setup()
~~~~~~~~~~~~~~^^
File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 522, in run_setup
super().run_setup(setup_script=setup_script)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in run_setup
exec(code, locals())
~~~~^^^^^^^^^^^^^^^^
File "<string>", line 157, in <module>
File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\Cython\Build\Dependencies.py", line 1154, in cythonize
cythonize_one(*args)
~~~~~~~~~~~~~^^^^^^^
File "C:\Users\lode_\AppData\Local\Temp\pip-build-env-t1b0hxdp\overlay\Lib\site-packages\Cython\Build\Dependencies.py", line 1321, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: av\logging.pyx
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
C:\Users\lode_\scripts\noScribe (main -> origin)
So a lot of modules did not get installed no issue I tried starting noscribe and everytime it missed a module I installed it manually until, ctranslate2
C:\Users\lode_\scripts\noScribe (main -> origin)
(noscribe) λ python noScribe.py
C:\Users\lode_\scripts\noScribe\noScribe.py:266: SyntaxWarning: invalid escape sequence '\['
timestamp_re = re.compile('\[\d\d:\d\d:\d\d.\d\d\d --> \d\d:\d\d:\d\d.\d\d\d\]')
C:\Users\lode_\scripts\noScribe\noScribe.py:380: SyntaxWarning: invalid escape sequence '\d'
self.valid = re.compile('^\d{0,2}(:\d{0,2}(:\d{0,2})?)?$', re.I)
Traceback (most recent call last):
File "C:\Users\lode_\scripts\noScribe\noScribe.py", line 41, in <module>
from ctranslate2 import get_cuda_device_count
ModuleNotFoundError: No module named 'ctranslate2'
I tried installing ctranslate2 with
pip install ctranslate2
according to: https://opennmt.net/CTranslate2/installation.html
Which gave me:
(noscribe) λ pip install ctranslate2
ERROR: Could not find a version that satisfies the requirement ctranslate2 (from versions: none)
ERROR: No matching distribution found for ctranslate2
Any ideas?
Oh of course -_- https://github.com/OpenNMT/CTranslate2/issues/1817 -_-
C:\Users\lode_\scripts\noScribe (main -> origin)
(noscribe) λ python --version
Python 3.13.0
Yes, I would generally avoid python 3.13 for now since many libraries are not yet compatible.
Ok ok we got a little further, now I get the following error. It might be the last?
(noscribe) λ python noScribe.py
C:\Users\lode_\scripts\noScribe\noScribe.py:266: SyntaxWarning: invalid escape sequence '\['
timestamp_re = re.compile('\[\d\d:\d\d:\d\d.\d\d\d --> \d\d:\d\d:\d\d.\d\d\d\]')
C:\Users\lode_\scripts\noScribe\noScribe.py:380: SyntaxWarning: invalid escape sequence '\d'
self.valid = re.compile('^\d{0,2}(:\d{0,2}(:\d{0,2})?)?$', re.I)
Traceback (most recent call last):
File "C:\Users\lode_\scripts\noScribe\noScribe.py", line 227, in <module>
from i18n import t
ImportError: cannot import name 't' from 'i18n' (C:\Users\lode_\scripts\noScribe\noscribe\Lib\site-packages\i18n\__init__.py)
@Lod3: Sorry, I've missed your last message.
- You can ignore the warnings about the "invalid escape sequence"
- The error regarding i18n (used for localization) is strange. Try
pip install python-i18n --upgrade, you may have another module with the same name or an older version installed.
If you have noScribe running, please test at first if the error is still there.
After that, you can go to line 1431 in the source and change it to: temperature=0.0, (make sure to also remove the '#' in front and include the ',' at the end).
In context, it should look like this:
segments, info = model.transcribe(
self.tmp_audio_file, # audio,
language=whisper_lang,
multilingual=multilingual,
beam_size=5,
temperature=0.0,
word_timestamps=True,
#initial_prompt=self.prompt,
hotwords=self.prompt,
vad_filter=True,
vad_parameters=vad_parameters,
# length_penalty=0.5
)
The "temperature" parameter controls how hard whisper tries to make sense of passages which are difficult to understand. Setting this to a low value can reduce hallucination. We often see such hallucinations at the end of the recording, when whisper is trying (too) hard to make sense of the very last fragment of audio that has no real information in it anymore. I'm curious to see if this parameter influences the crashes that you were experiencing.
You are experienceing the samie issues i have with my own homebrew .py script for audio transcription and spreaker recognition. It crashes python the very same with long audio files.
+ System
- Provider
[ Name] Application Error
- EventID 1000
[ Qualifiers] 0
Version 0
Level 2
Task 100
Opcode 0
Keywords 0x80000000000000
- TimeCreated
[ SystemTime] 2025-03-13T13:46:46.4939905Z
EventRecordID 136237
Correlation
- Execution
[ ProcessID] 0
[ ThreadID] 0
Channel Application
Computer DESKTOP-00M8NI3
Security
+ EventData
noScribe.exe
0.0.0.0
67ad01d1
KERNELBASE.dll
10.0.19041.5486
09ce69c7
e06d7363
000000000003b699
3c34
01db941bb8ebfd22
G:\SD\noScribe\noScribe.exe
C:\Windows\System32\KERNELBASE.dll
2c583055-807c-4201-9ec7-367b0e9ef285
+ System
- Provider
[ Name] Application Error
- EventID 1000
[ Qualifiers] 0
Version 0
Level 2
Task 100
Opcode 0
Keywords 0x80000000000000
- TimeCreated
[ SystemTime] 2025-03-13T13:46:52.0847497Z
EventRecordID 136238
Correlation
- Execution
[ ProcessID] 0
[ ThreadID] 0
Channel Application
Computer DESKTOP-00M8NI3
Security
+ EventData
noScribe.exe
0.0.0.0
67ad01d1
ucrtbase.dll
10.0.19041.3636
81cf5d89
c0000409
000000000007286e
3c34
01db941bb8ebfd22
G:\SD\noScribe\noScribe.exe
C:\Windows\System32\ucrtbase.dll
f3b6d1cd-3010-481c-8ce8-1f49e662c878
@Mangaclub: Is your script also using whisper large-v3-turbo together with cuda? Could you share your script, or at least the part dealing with whisper?
i'll see what i can do, its closed source right now as its for a compny project. But i honestly doubt its whisper (happens with all of them). IT crashes on diarisation / pyannote workflow when its done and sends the speakers back. We both crash with a kernalbase.dll crash wich is not trivial but imo sounds more like theres a buffer overflow or somethign directly inside pyannote or surrounding components. I need to debug this as well but i wouldnt be surprised changing pyannotes verison back downwards would fix things... maybe...
IT crashes on diarisation / pyannote workflow when its done and sends the speakers back.
In this case, I don't think it is the same error, since the crash described here is happening much later in the process.
@Lod3: Have you got it running from source? I have another idea what to test, but no machine with NVIDIA graphics at hand.
I had to order a new PSU. My current one caused hard power cuts whenever I tried to run a transcode or game. It should arrive soon this week or beginning next week.
I had to order a new PSU
Ah, OK. Good luck with that :)
Little update, i think we have the same Problem, it turnso ut it was the way i caleld fastwhisper I fixed it by multiprocessing
Since i am not allowed to share sourcode here a simple overview: Mayyybe it helps but its like over simplified.
Initialisation:
# Creating a Whisper model instance
model = WhisperModel(
model_size_or_path=whisper_model_name, # e.g., "large-v2"
device=device, # "cuda" or "cpu"
compute_type=compute_type, # "float16" or "float32"
download_root=f"models/whisper-{whisper_model_name}"
)
Multiprocessing Implementation: Transcription runs in a separate process to prevent UI freezing / crashing Results are passed back via a multiprocessing Queue
# In start_transcription method:
self.result_queue = multiprocessing.Queue()
self.transcription_process = multiprocessing.Process(
target=run_transcription_process,
args=(self.audio_file, self.compute_type_var.get(), ...)
)
self.transcription_process.start()
Result Handling: Main thread periodically checks for results using wait_for_transcription_result When results arrive, they're displayed and optionally passed to diarization
Speaker Diarization Process Speaker diarization identifies who spoke when:
Model Loading:
# Initialize SpeechBrain speaker recognition model
self.speaker_recognition_model = SpeakerRecognition.from_hparams(
source=speaker_reco_model_name, # e.g., "speechbrain/spkrec-ecapa-voxceleb"
savedir=f"models/speechbrain-{speaker_reco_model_name.split('/')[-1]}",
run_opts={"device": device}
)
Voice Embedding Generation:
For each transcription segment, extract audio and create speaker embeddings Embeddings are vector representations of voice characteristics Clustering for Speaker Identification:
# Group similar voices using clustering
clustering = AgglomerativeClustering(n_clusters=n_clusters, linkage="ward").fit(embeddings)
labels = clustering.labels_
@Mangaclub Thank you for showing more of your implementation, this helps. However, I don't fully understand where exactly in your process the crash happened and what you did to fix it (something with multiprocessing, but how was it implemented before?)...
Okay since i cannot really share the whole code heres a AI summary cus my brain is mushed from a long night ;)
So herre the AI summary: The Original Script (Before Improvement):
In the original script, the relevant functions for transcription are: start_transcription(self): This function is called when the user clicks "Start Transcription". It initiates a new thread to execute the actual transcription work in the run_transcription(self) function.
def start_transcription(self):
if not self.audio_file:
messagebox.showwarning("Warning", "Please select an audio file first.")
return
self.status_var.set("Starting transcription...")
threading.Thread(target=self.run_transcription, daemon=True).start()
run_transcription(self): This function contains the code to load the Whisper model and perform the transcription.
def run_transcription(self):
try:
self.status_var.set("Loading Whisper model...")
self.root.update_idletasks()
compute_type = self.compute_type_var.get()
whisper_model_name = self.whisper_model_var.get()
device = self.device_var.get()
model = WhisperModel(
model_size_or_path=whisper_model_name,
device=device,
compute_type=compute_type,
download_root=f"models/whisper-{whisper_model_name}"
)
language = None if self.language_var.get() == "auto" else self.language_var.get()
self.status_var.set("Transcribing audio...")
self.root.update_idletasks()
segments, info = model.transcribe(
self.audio_file,
language=language,
vad_filter=True,
word_timestamps=True
)
self.transcription = {
"segments": [{
"start": seg.start,
"end": seg.end,
"text": seg.text.strip(),
"words": [{"word": w.word, "start": w.start, "end": w.end, "probability": w.probability} for w in seg.words]
} for seg in segments],
"language": info.language,
"language_probability": info.language_probability
}
self.display_transcription()
if self.diarize_var.get():
self.status_var.set("Running speaker diarization...")
self.root.update_idletasks()
self.run_speechbrain_diarization()
else:
self.status_var.set("Transcription completed.")
except Exception as e:
self.status_var.set(f"Error: {str(e)}")
messagebox.showerror("Error", f"Transcription failed: {str(e)}")
The Original Way of Calling and Handling Whisper:
- The transcription was performed within the run_transcription function.
- This function was executed in a separate thread using threading.Thread.
The Problem with the Original Approach (Using Threads):
While the idea of using a separate thread for the computationally intensive transcription is correct to avoid completely freezing the GUI, it is often not sufficient to prevent significant UI freezing for CPU-bound tasks like Whisper transcription in Python. The reason for this is the Global Interpreter Lock (GIL) in CPython (the standard Python implementation). The GIL allows only one thread to execute Python bytecode at a time. For I/O-bound tasks, the GIL is released, allowing other threads to work. However, Whisper transcription is heavily CPU-bound. Even though a separate thread was used, the Whisper calculations could still consume so much of the CPU time (due to the GIL) that the main GUI thread became unresponsive, leading to the perceived "crash" or freezing. The root.update_idletasks() calls attempted to update the GUI during the process but were not enough in this case.
The New Way of Calling and Handling Whisper (in the Improved Code):
In the improved code, the transcription is offloaded to a separate process instead of a thread:
- run_transcription_process(audio_file, compute_type, whisper_model_name, device, language, result_queue): This new function contains the transcription code and is used as the target for the new process.
def run_transcription_process(audio_file, compute_type, whisper_model_name, device, language, result_queue):
from faster_whisper import WhisperModel
import logging
logging.basicConfig(level=logging.DEBUG)
try:
model = WhisperModel(
model_size_or_path=whisper_model_name,
device=device,
compute_type=compute_type,
download_root=f"models/whisper-{whisper_model_name}"
)
language_code = language if language != "auto" else None
segments, info = model.transcribe(
audio_file,
language=language_code,
vad_filter=True,
word_timestamps=True
)
segments_list =
for segment in segments:
segments_list.append({
"start": segment.start,
"end": segment.end,
"text": segment.text.strip(),
"words": [{"word": w.word, "start": w.start, "end": w.end, "probability": w.probability} for w in segment.words]
})
info_dict = {"language": info.language, "language_probability": info.language_probability}
results = {"segments": segments_list, "info": info_dict}
result_queue.put(results)
except Exception as e:
logging.error(f"Error during transcription in process: {e}")
result_queue.put(None)
- Using multiprocessing.Process: In the start_transcription function, a multiprocessing.Process is now created and started:
self.result_queue = multiprocessing.Queue()
self.transcription_process = multiprocessing.Process(target=run_transcription_process, args=(
self.audio_file,
self.compute_type_var.get(),
self.whisper_model_var.get(),
self.device_var.get(),
self.language_var.get() if self.language_var.get() != "auto" else None,
self.result_queue
))
self.transcription_process.start()
self.root.after(500, self.wait_for_transcription_result)
-
Communication via multiprocessing.Queue: The transcription results are sent back to the main process via a multiprocessing.Queue (self.result_queue).
-
Waiting for the Result in the Main Process: The wait_for_transcription_result function in the main process waits for the transcription process to finish and then retrieves the results from the queue.
Why the "Crash" (Freezing) Doesn't Happen Now:
- Bypassing the GIL: Each multiprocessing process has its own Python interpreter and therefore its own GIL. This allows the CPU-intensive Whisper transcription calculations to run in a separate process without being limited by the GIL of the main process (which handles the GUI).
- True Parallelism: Multiprocessing enables true parallel execution on systems with multiple cores or processors, which can potentially reduce the transcription time (compared to the threading solution, which often doesn't benefit from multiple cores due to the GIL).
In Summary:
The key difference is that the original script used threading, which is often insufficient to prevent GUI freezing /crashing for CPU-bound tasks due to the GIL. The improved code uses multiprocessing, which creates separate processes and thus bypasses the GIL, leading to a more responsive application because the computationally intensive Whisper transcription runs in the background without blocking the main GUI thread.