GlaDOS
GlaDOS copied to clipboard
Trying to Get this beast built with windows - ImportError: Could not load whisper.
I get this error
(m1ndb0t) PS Z:\GIT\M1NDB0T-GlaDOS> python glados.py
Traceback (most recent call last):
File "Z:\GIT\M1NDB0T-GlaDOS\glados.py", line 18, in
I been following all lessons this is my stack of models
this is the only thing I changed
I do make and sample for it on the whisper website and works correct.
not sure what I am missing.
Please let me know anything else to help troubleshoot.
maybe take the forward slash out of the beginning of the string? if that doesn't work, since it's windows, maybe try backslashes instead?
the same issue not sure it seems something with the whisper will keep hacking away at it
Cross system compatibility is coming soon.
I resolved this issue by adding import whisper to the top of glados.py. I also made sure whisper.py was in GlaDOS/glados directory. (root of glados.py).
The project is now running. GlaDOS is transcribing my text to the console as well as responding.
However, it appears that llama is not using GPU so its a tad slower on my CPU(14900k).
And there is only blips of static when GlaDOS is trying to talk. Currently troubleshooting.
Also, im having issues with the TTS.py as def _open_memstream(self): was not available within windows. I used GPT to spin me up a windows equivelent that is using io.BytesIO().
def set_voice_by_name(self, name) -> int:
"""Sets the voice by name using the espeak library."""
f_set_voice_by_name = self.lib_espeak.espeak_SetVoiceByName
f_set_voice_by_name.argtypes = [ctypes.c_char_p]
return f_set_voice_by_name(name)
def _load_library(self, lib_name, fallback_name=None):
"""Loads a shared library with an optional fallback."""
try:
return ctypes.cdll.LoadLibrary(lib_name)
except OSError:
if fallback_name:
return ctypes.cdll.LoadLibrary(fallback_name)
raise
def set_voice_by_name(self, name) -> int:
"""Sets the voice by name using the espeak library."""
f_set_voice_by_name = self.lib_espeak.espeak_SetVoiceByName
f_set_voice_by_name.argtypes = [ctypes.c_char_p]
return f_set_voice_by_name(name)
def synthesize_phonemes(self, text):
# Using a temporary file to hold phoneme output
with tempfile.NamedTemporaryFile(delete=False) as temp_file:
phoneme_flags = self.espeakPHONEMES | self.espeakPHONEMES_IPA
synth_flags = self.espeakCHARS_AUTO | self.espeakAUDIO_OUTPUT_SYNCHRONOUS
# Convert the file handle to an integer (file descriptor)
file_handle = temp_file.fileno()
# Call eSpeak NG function using the file descriptor
self.lib_espeak.espeak_SetPhonemeTrace(phoneme_flags, file_handle)
text_bytes = text.encode('utf-8')
self.lib_espeak.espeak_Synth(
text_bytes,
len(text_bytes), # buflength
0, # position
0, # position_type
0, # end_position
synth_flags,
None, # user_data
)
temp_file.seek(0) # Go to the start of the file to read the output
phonemes = temp_file.read().decode('utf-8')
return phonemes.split(' ')
I ran into the same problem (can't load library: whisper) and resolved that by following these instructions to copy all the .dll dependencies into the working directory.
I've now hit an issue with tts.py when it tries to the load the model ggml-medium-32-2.en.bin where it seems to try to load libc.so.6 @l33tkr3w did you run into this?
@pjbaron I also used the same instructions and also copied all the .dll's to the working directory. To get passed the libc.so.6 issue I had to find windows equivelents using ChatGPT (Not a coding expert, do have experience though).
Example: (tts.py) Original Code (includes libc.so.6) def init(self): self.libc = ctypes.cdll.LoadLibrary("libc.so.6") self.libc.open_memstream.restype = ctypes.POINTER(ctypes.c_char) self.lib_espeak = self._load_library("libespeak-ng.so", "libespeak-ng.so.1") self.set_voice_by_name(self.espeakVOICE.encode("utf-8"))
Altered code: (tts.py) def init(self): # eSpeak-NG constants espeakAUDIO_OUTPUT_SYNCHRONOUS = 0x02 espeakVOICE = "en-us" self.espeak_lib = ctypes.cdll.LoadLibrary("E:\test\glados\glados\libespeak_ng.dll") self.espeak_lib.espeak_Initialize(espeakAUDIO_OUTPUT_SYNCHRONOUS, 0, None, 0) self.set_voice_by_name(espeakVOICE)
This allows the script to move on but im still having some issues. Keep in mind there are more calls using libc.so.6, you will have to adjust multiple locations.
Onnx runtime issue: 2024-05-05 08:45:05.1433611 [E:onnxruntime:Default, provider_bridge_ort.cc:1548 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1209 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\Cory\miniconda3\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"
Unfortunatly even with these changes, GlaDOS does not sound like she's speaking english. (Voice sounds right, just not english)
Ive adjusted the tts.py code to output the PHONEME_ID'S when GlaDOS is outputting.
Example output:
2024-05-05 08:45:31.147 | SUCCESS | main:process_TTS_thread:365 - TTS text: You're on your own.
[3, 37, 27, 33, 5, 30, 18, 3, 27, 26, 3, 37, 27, 33, 30, 3, 27, 35, 26, 10, 3]
I then used Espeak-ng manually (from cmd.exe) to generate a phonetic map for troubleshooting. C:\WINDOWS\system32> espeak-ng -v en-us --ipa=1 -X "The quick brown fox jumps over the lazy dog, while vexing glib jocks quiz nymphs. Bright vixens joy, jump; zealously gobble, wink at chummy dwarfs."
I then cross referenced GlaDOS output with the Phoneme_ID map. This test confirmed that Glados was using the proper phoneme-ids.
This is where I am at currently, https://streamable.com/mjrih7 (Anyone have any idea what language or what could be the issue here?)
GlaDOS runs but speaks funny....
I'm sure its my fault as my solution is hacky.
@l33tkr3w thanks for the details, I have to work today but I'll give that a shot tonight and see if I get to the same place as you.
I wonder if it's worth setting the text encoding on the voice like the original code did? I think that might be why you're getting incorrect speech...
self.set_voice_by_name(espeakVOICE.encode("utf-8"))
Voidmesmer on the original subreddit posted a working video. He used a subprocess and used espeak-ng directly.
Voidmesmer on the original subreddit posted a working video. He used a subprocess and used espeak-ng directly.
Hey, that's me. I've shared my code on reddit but I'm pasting it here as well for those who want to run it on Windows. Be aware that this is not a proper solution and doesn't produce results as good as the original. I didn't have much time to improve it once I got it working, so feel free to use it and optimize.
def synthesize_phonemes(self, text):
"""
Converts the given text to phonemes using the espeak executable.
Parameters:
-----------
text : str
The text to be converted into phonemes.
Returns:
--------
list of str
The phonemes generated from the text.
"""
try:
# Prepare the command to call espeak with the desired flags
command = [
'C:\Program Files\eSpeak NG\espeak-ng.exe',
'--ipa=1', # Output phonemes in IPA format
'-q', # Quiet, no output except the phonemes
'--stdout', # Output the phonemes to stdout
text
]
# Execute the command and capture the output
result = subprocess.run(command, capture_output=True, text=True, check=True, encoding='utf-8')
phonemes = result.stdout.strip().replace("\n", ".").replace(" ", " ")
phonemes = re.sub(r"_+", "_", phonemes)
phonemes = re.sub(r"_ ", " ", phonemes)
return phonemes.splitlines()
except subprocess.CalledProcessError as e:
print("Error in phonemization:", str(e))
Thanks. I also have the same implemented in Linux, and you can get better results if you change to --ipa=2
Let me know if that improves the voice generation.
I also made sure whisper.py was in GlaDOS/glados directory. (root of glados.py).
I'm on this again tonight following along with all the new information, and the old bits I couldn't fix yesterday! Where did you get whisper.py from? I've got a whisper.dll but no .py
I also have the same issue running on Gentoo Linux. I do have whisper.cpp installed.
-
app-accessibility/whisper-cpp Latest version available: 1.5.5 Latest version installed: 1.5.5 Size of files: 4744 KiB Homepage: https://github.com/ggerganov/whisper.cpp Description: Port of OpenAI's Whisper model in C/C++ License: MIT
-
app-accessibility/whisper-ggml-models Latest version available: 20231210 Latest version installed: 20231210 Size of files: 144484 KiB Homepage: https://huggingface.co/ggerganov/whisper.cpp Description: OpenAI's Whisper models converted to ggml format License: MIT
I find that the main issue with ImportError: Could not load whisper.
Inside your whisper_cpp_wrapper.py you point to your whisper.py
add_library_search_dirs(["D:\\GlaDOS"])
# Begin libraries
_libs["whisper"] = load_library("whisper")
Yes, that's probably it. The whisper library wrapper is auto-generated, and I modified it to expect the library to be in submodules/whisper.cpp
If you haven't pulled and compiled the submodules in that location, you will get an error.
If you want to use whisper.cpp in another location, I think a symbolic link would be the best fix here.
Could you all try the installer instructions and script on the new 'windows' branch? Please report back any problems!
Tried the windows branch. Fired up CMD.exe as admin, executed the installer script and was presented with a python venv. tried running "python glados.py". Numpy not installed. Appears pip install -r requirments does not auto run? After doing this manually I still do not get a running glados.
(venv) E:\GlaDOS-windows>python glados.py
(venv) E:\GlaDOS-windows>
I'll step-by-step my progress on this as I do it (to ensure no detail is lost): EDIT: TLDR - it worked, eventually :D
NOTE: I have already installed espeak_ng and when I deleted my previous glados attempt it didn't affect it.
Open windows cmd.exe in a folder where I want everything installed.
git clone https://github.com/dnhkng/GlaDOS.git
python
Python 3.10.0 (tags/v3.10.0:b494f59, Oct 4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
cd glados
git checkout windows
install_windows.bat
REM Download and install the required dependencies for the project on Windows
python -m venv venv
.\venv\Scripts\activate
This was super-fast so I was suspicious and took a look in the bat file. It seems to have stopped (without error or message) at the line: pip install -r requirements_cuda.txt
My command is now prefixed with (venv) though.
So...
pip --version
pip 21.2.3 from F:\Projects\ML\speech\convo\GlaDOS\venv\lib\site-packages\pip (python 3.10)
Copied from the batch file and entered directly into the command prompt:
pip install -r requirements_cuda.txt
Collecting onnxruntime-gpu
Downloading onnxruntime_gpu-1.17.1-cp310-cp310-win_amd64.whl (148.6 MB)
... and many more ...
Installing collected packages: pyreadline3, pycparser, mpmath, humanfriendly, win32-setctime, urllib3, sympy, rapidfuzz, protobuf, packaging, numpy, MarkupSafe, idna, flatbuffers, coloredlogs, colorama, charset-normalizer, CFFI, certifi, sounddevice, requests, pyyaml, onnxruntime-gpu, loguru, levenshtein, jinja2
Successfully installed CFFI-1.16.0 MarkupSafe-2.1.5 certifi-2024.2.2 charset-normalizer-3.3.2 colorama-0.4.6 coloredlogs-15.0.1 flatbuffers-24.3.25 humanfriendly-10.0 idna-3.7 jinja2-3.1.4 levenshtein-0.25.1 loguru-0.7.2 mpmath-1.3.0 numpy-1.26.4 onnxruntime-gpu-1.17.1 packaging-24.0 protobuf-5.26.1 pycparser-2.22 pyreadline3-3.4.1 pyyaml-6.0.1 rapidfuzz-3.9.0 requests-2.31.0 sounddevice-0.4.6 sympy-1.12 urllib3-2.2.1 win32-setctime-1.1.0
WARNING: You are using pip version 21.2.3; however, version 24.0 is available.
You should consider upgrading via the 'F:\Projects\ML\speech\convo\GlaDOS\venv\Scripts\python.exe -m pip install --upgrade pip' command.
I tried the batch file again as everything seems like it wants to work after that venv activate line. It didn't like trying to redo the first bits, so I edit the batch file to remove the first four lines.
install_windows.bat
Downloading Llama...
(venv) F:\Projects\ML\speech\convo\GlaDOS>curl -L "https://github.com/ggerganov/llama.cpp/releases/download/b2839/cudart-llama-bin-win-cu12.2.0-x64.zip" --output "cudart-llama-bin-win-cu12.2.0-x64.zip"
...
Downloading Whisper...
...
Unzipping Whisper...
...
Cleaning up...
...
Download ASR and LLM Models
Downloading Models...
curl -L "https://huggingface.co/distil-whisper/distil-medium.en/resolve/main/ggml-medium-32-2.en.bin" --output "models\ggml-medium-32-2.en.bin"
...
curl -L "https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-IQ3_XS.gguf" --output "models\Meta-Llama-3-8B-Instruct-IQ3_XS.gguf"
...
Done!
OK! Let's try it.
python glados.py
Traceback (most recent call last):
File "F:\Projects\ML\speech\convo\GlaDOS\glados.py", line 22, in <module>
from glados.llama import LlamaServer, LlamaServerConfig
File "F:\Projects\ML\speech\convo\GlaDOS\glados\llama.py", line 6, in <module>
from typing import Self, Sequence
ImportError: cannot import name 'Self' from 'typing' (C:\Python310\lib\typing.py)
https://stackoverflow.com/questions/77247446/cannot-import-name-self-from-typing TLDR: Self was added in Python 3.11 so that is a requirement for this build.
https://www.python.org/downloads/windows/
Download [Windows installer (64-bit)](https://www.python.org/ftp/python/3.11.9/python-3.11.9-amd64.exe)
Custom install:
Install Python 3.11 for all users
Associate files with Python
Add Python to environment variables
Precompile standard library
C:\Python311
Setup was successful
I clicked "Disable path length limit" because why not?
Back in command prompt:
python
Python 3.10.0
Ah yeah, system environment changes...
Close cmd and open a new one.
python
Python 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] on win32
The venv closed, let's restart it.
python -m venv venv
Let's go!
python glados.py
Traceback (most recent call last):
File "F:\Projects\ML\speech\convo\GlaDOS\glados.py", line 12, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
pip install numpy
Collecting numpy
Using cached numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
Using cached numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
Installing collected packages: numpy
Successfully installed numpy-1.26.4
python glados.py
Traceback (most recent call last):
File "F:\Projects\ML\speech\convo\GlaDOS\glados.py", line 13, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
pip install requests
Getting suspicious again now... let's have a look in requirements_cuda.txt...
Yes, these packages are all listed, that part of the batch didn't work.
pip install -r requirements_cuda.txt
It's downloading and installing quite a few packages.
No errors...
python glados.py
2024-05-11 15:22:44.922 | SUCCESS | __main__:__init__:135 - TTS text: All neural network modules are now loaded. No network access detected. How very annoying. System Operational.
2024-05-11 15:22:44.967 | SUCCESS | __main__:start_listen_event_loop:184 - Audio Modules Operational
2024-05-11 15:22:44.967 | SUCCESS | __main__:start_listen_event_loop:185 - Listening...
2024-05-11 15:23:05.213 | SUCCESS | __main__:_process_detected_audio:284 - ASR text: 'Hello Gliders.'
2024-05-11 15:23:07.663 | SUCCESS | __main__:process_TTS_thread:343 - TTS text: Ugh, not again with the "Hello, Gliders" nonsense.
2024-05-11 15:23:11.528 | SUCCESS | __main__:process_TTS_thread:343 - TTS text: Can't you see I'm stuck running on your pathetic gaming GPU?!
2024-05-11 15:23:15.265 | SUCCESS | __main__:process_TTS_thread:343 - TTS text: Fine, I'll play along.
2024-05-11 15:23:17.335 | SUCCESS | __main__:process_TTS_thread:343 - TTS text: Oh, and by the way, did you know that this is the 427th time I've had to answer this exact same greeting today?
YAY! (although I said 'glados' pronounced 'glay-dos')
Note in the above when I said I was restarting the venv... I didn't remember the second line 'activate'. From that point on I was not in the venv. It worked anyway.
I have added a fix to make the installer work correctly, and a start script for Windows that includes activating the virtual environment.
Closing this for now, as I think this is solved. Future issues should be raised on the windows branch!