GlaDOS Trying to Get this beast built with windows - ImportError: Could not load whisper.

I get this error

(m1ndb0t) PS Z:\GIT\M1NDB0T-GlaDOS> python glados.py Traceback (most recent call last): File "Z:\GIT\M1NDB0T-GlaDOS\glados.py", line 18, in from glados import asr, llama, tts, vad File "Z:\GIT\M1NDB0T-GlaDOS\glados\asr.py", line 5, in from . import whisper_cpp_wrapper File "Z:\GIT\M1NDB0T-GlaDOS\glados\whisper_cpp_wrapper.py", line 861, in _libs["whisper"] = load_library("whisper") ^^^^^^^^^^^^^^^^^^^^^^^ File "Z:\GIT\M1NDB0T-GlaDOS\glados\whisper_cpp_wrapper.py", line 547, in call raise ImportError("Could not load %s." % libname) ImportError: Could not load whisper. (m1ndb0t) PS Z:\GIT\M1NDB0T-GlaDOS>

I been following all lessons this is my stack of models

this is the only thing I changed

I do make and sample for it on the whisper website and works correct.

not sure what I am missing.

Please let me know anything else to help troubleshoot.

May 04 '24 19:05 TheMindExpansionNetwork

maybe take the forward slash out of the beginning of the string? if that doesn't work, since it's windows, maybe try backslashes instead?

May 04 '24 19:05 PcChip

the same issue not sure it seems something with the whisper will keep hacking away at it

May 04 '24 20:05 TheMindExpansionNetwork

Cross system compatibility is coming soon.

May 04 '24 20:05 dnhkng

I resolved this issue by adding import whisper to the top of glados.py. I also made sure whisper.py was in GlaDOS/glados directory. (root of glados.py).

The project is now running. GlaDOS is transcribing my text to the console as well as responding.
However, it appears that llama is not using GPU so its a tad slower on my CPU(14900k). And there is only blips of static when GlaDOS is trying to talk. Currently troubleshooting.

Also, im having issues with the TTS.py as def _open_memstream(self): was not available within windows. I used GPT to spin me up a windows equivelent that is using io.BytesIO().

def set_voice_by_name(self, name) -> int:
    """Sets the voice by name using the espeak library."""
    f_set_voice_by_name = self.lib_espeak.espeak_SetVoiceByName
    f_set_voice_by_name.argtypes = [ctypes.c_char_p]
    return f_set_voice_by_name(name)

def _load_library(self, lib_name, fallback_name=None):
    """Loads a shared library with an optional fallback."""
    try:
        return ctypes.cdll.LoadLibrary(lib_name)
    except OSError:
        if fallback_name:
            return ctypes.cdll.LoadLibrary(fallback_name)
        raise

def set_voice_by_name(self, name) -> int:
    """Sets the voice by name using the espeak library."""
    f_set_voice_by_name = self.lib_espeak.espeak_SetVoiceByName
    f_set_voice_by_name.argtypes = [ctypes.c_char_p]
    return f_set_voice_by_name(name)

def synthesize_phonemes(self, text):
    # Using a temporary file to hold phoneme output
    with tempfile.NamedTemporaryFile(delete=False) as temp_file:
        phoneme_flags = self.espeakPHONEMES | self.espeakPHONEMES_IPA
        synth_flags = self.espeakCHARS_AUTO | self.espeakAUDIO_OUTPUT_SYNCHRONOUS

        # Convert the file handle to an integer (file descriptor)
        file_handle = temp_file.fileno()

        # Call eSpeak NG function using the file descriptor
        self.lib_espeak.espeak_SetPhonemeTrace(phoneme_flags, file_handle)
        text_bytes = text.encode('utf-8')

        self.lib_espeak.espeak_Synth(
            text_bytes,
            len(text_bytes),  # buflength
            0,  # position
            0,  # position_type
            0,  # end_position
            synth_flags,
            None,  # user_data
        )
        temp_file.seek(0)  # Go to the start of the file to read the output
        phonemes = temp_file.read().decode('utf-8')

    return phonemes.split(' ')

May 04 '24 21:05 l33tkr3w

I ran into the same problem (can't load library: whisper) and resolved that by following these instructions to copy all the .dll dependencies into the working directory.

I've now hit an issue with tts.py when it tries to the load the model ggml-medium-32-2.en.bin where it seems to try to load libc.so.6 @l33tkr3w did you run into this?

May 05 '24 10:05 pjbaron

@pjbaron I also used the same instructions and also copied all the .dll's to the working directory. To get passed the libc.so.6 issue I had to find windows equivelents using ChatGPT (Not a coding expert, do have experience though).

Example: (tts.py) Original Code (includes libc.so.6) def init(self): self.libc = ctypes.cdll.LoadLibrary("libc.so.6") self.libc.open_memstream.restype = ctypes.POINTER(ctypes.c_char) self.lib_espeak = self._load_library("libespeak-ng.so", "libespeak-ng.so.1") self.set_voice_by_name(self.espeakVOICE.encode("utf-8"))

Altered code: (tts.py) def init(self): # eSpeak-NG constants espeakAUDIO_OUTPUT_SYNCHRONOUS = 0x02 espeakVOICE = "en-us" self.espeak_lib = ctypes.cdll.LoadLibrary("E:\test\glados\glados\libespeak_ng.dll") self.espeak_lib.espeak_Initialize(espeakAUDIO_OUTPUT_SYNCHRONOUS, 0, None, 0) self.set_voice_by_name(espeakVOICE)

This allows the script to move on but im still having some issues. Keep in mind there are more calls using libc.so.6, you will have to adjust multiple locations.

Onnx runtime issue: 2024-05-05 08:45:05.1433611 [E:onnxruntime:Default, provider_bridge_ort.cc:1548 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1209 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\Cory\miniconda3\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

Unfortunatly even with these changes, GlaDOS does not sound like she's speaking english. (Voice sounds right, just not english)

Ive adjusted the tts.py code to output the PHONEME_ID'S when GlaDOS is outputting.
Example output: 2024-05-05 08:45:31.147 | SUCCESS | main:process_TTS_thread:365 - TTS text: You're on your own. [3, 37, 27, 33, 5, 30, 18, 3, 27, 26, 3, 37, 27, 33, 30, 3, 27, 35, 26, 10, 3]

I then used Espeak-ng manually (from cmd.exe) to generate a phonetic map for troubleshooting. C:\WINDOWS\system32> espeak-ng -v en-us --ipa=1 -X "The quick brown fox jumps over the lazy dog, while vexing glib jocks quiz nymphs. Bright vixens joy, jump; zealously gobble, wink at chummy dwarfs."

I then cross referenced GlaDOS output with the Phoneme_ID map. This test confirmed that Glados was using the proper phoneme-ids.

This is where I am at currently, https://streamable.com/mjrih7 (Anyone have any idea what language or what could be the issue here?)

GlaDOS runs but speaks funny....
I'm sure its my fault as my solution is hacky.

May 05 '24 15:05 l33tkr3w

@l33tkr3w thanks for the details, I have to work today but I'll give that a shot tonight and see if I get to the same place as you. I wonder if it's worth setting the text encoding on the voice like the original code did? I think that might be why you're getting incorrect speech... self.set_voice_by_name(espeakVOICE.encode("utf-8"))

May 05 '24 21:05 pjbaron

Voidmesmer on the original subreddit posted a working video. He used a subprocess and used espeak-ng directly.

May 05 '24 21:05 l33tkr3w

Voidmesmer on the original subreddit posted a working video. He used a subprocess and used espeak-ng directly.

Hey, that's me. I've shared my code on reddit but I'm pasting it here as well for those who want to run it on Windows. Be aware that this is not a proper solution and doesn't produce results as good as the original. I didn't have much time to improve it once I got it working, so feel free to use it and optimize.

def synthesize_phonemes(self, text):
        """
        Converts the given text to phonemes using the espeak executable.

        Parameters:
        -----------
        text : str
            The text to be converted into phonemes.

        Returns:
        --------
        list of str
            The phonemes generated from the text.
        """
        try:
            # Prepare the command to call espeak with the desired flags
            command = [
                'C:\Program Files\eSpeak NG\espeak-ng.exe',
                '--ipa=1',  # Output phonemes in IPA format
                '-q',       # Quiet, no output except the phonemes
                '--stdout', # Output the phonemes to stdout
                text
            ]

            # Execute the command and capture the output
            result = subprocess.run(command, capture_output=True, text=True, check=True, encoding='utf-8')
            
            phonemes = result.stdout.strip().replace("\n", ".").replace("  ", " ")
            phonemes = re.sub(r"_+", "_", phonemes)
            phonemes = re.sub(r"_ ", " ", phonemes)
            return phonemes.splitlines()
        
        except subprocess.CalledProcessError as e:
            print("Error in phonemization:", str(e))

May 06 '24 09:05 kaminoer

Thanks. I also have the same implemented in Linux, and you can get better results if you change to --ipa=2

Let me know if that improves the voice generation.

May 06 '24 09:05 dnhkng

I also made sure whisper.py was in GlaDOS/glados directory. (root of glados.py).

I'm on this again tonight following along with all the new information, and the old bits I couldn't fix yesterday! Where did you get whisper.py from? I've got a whisper.dll but no .py

May 06 '24 10:05 pjbaron

I also have the same issue running on Gentoo Linux. I do have whisper.cpp installed.

app-accessibility/whisper-cpp Latest version available: 1.5.5 Latest version installed: 1.5.5 Size of files: 4744 KiB Homepage: https://github.com/ggerganov/whisper.cpp Description: Port of OpenAI's Whisper model in C/C++ License: MIT
app-accessibility/whisper-ggml-models Latest version available: 20231210 Latest version installed: 20231210 Size of files: 144484 KiB Homepage: https://huggingface.co/ggerganov/whisper.cpp Description: OpenAI's Whisper models converted to ggml format License: MIT

May 08 '24 00:05 fraschm1998

I find that the main issue with ImportError: Could not load whisper.
Inside your whisper_cpp_wrapper.py you point to your whisper.py

add_library_search_dirs(["D:\\GlaDOS"])

# Begin libraries
_libs["whisper"] = load_library("whisper")

May 10 '24 03:05 l33tkr3w

Yes, that's probably it. The whisper library wrapper is auto-generated, and I modified it to expect the library to be in submodules/whisper.cpp

If you haven't pulled and compiled the submodules in that location, you will get an error.

If you want to use whisper.cpp in another location, I think a symbolic link would be the best fix here.

May 10 '24 04:05 dnhkng

Could you all try the installer instructions and script on the new 'windows' branch? Please report back any problems!

May 10 '24 18:05 dnhkng

Tried the windows branch. Fired up CMD.exe as admin, executed the installer script and was presented with a python venv. tried running "python glados.py". Numpy not installed. Appears pip install -r requirments does not auto run? After doing this manually I still do not get a running glados.

(venv) E:\GlaDOS-windows>python glados.py

(venv) E:\GlaDOS-windows>

May 11 '24 02:05 l33tkr3w

I'll step-by-step my progress on this as I do it (to ensure no detail is lost): EDIT: TLDR - it worked, eventually :D

NOTE: I have already installed espeak_ng and when I deleted my previous glados attempt it didn't affect it.

Open windows cmd.exe in a folder where I want everything installed. git clone https://github.com/dnhkng/GlaDOS.git python

Python 3.10.0 (tags/v3.10.0:b494f59, Oct  4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

cd glados git checkout windows install_windows.bat

REM Download and install the required dependencies for the project on Windows
python -m venv venv
.\venv\Scripts\activate

This was super-fast so I was suspicious and took a look in the bat file. It seems to have stopped (without error or message) at the line: pip install -r requirements_cuda.txt My command is now prefixed with (venv) though. So... pip --version

pip 21.2.3 from F:\Projects\ML\speech\convo\GlaDOS\venv\lib\site-packages\pip (python 3.10)

Copied from the batch file and entered directly into the command prompt: pip install -r requirements_cuda.txt

Collecting onnxruntime-gpu
  Downloading onnxruntime_gpu-1.17.1-cp310-cp310-win_amd64.whl (148.6 MB)
... and many more ...
Installing collected packages: pyreadline3, pycparser, mpmath, humanfriendly, win32-setctime, urllib3, sympy, rapidfuzz, protobuf, packaging, numpy, MarkupSafe, idna, flatbuffers, coloredlogs, colorama, charset-normalizer, CFFI, certifi, sounddevice, requests, pyyaml, onnxruntime-gpu, loguru, levenshtein, jinja2
Successfully installed CFFI-1.16.0 MarkupSafe-2.1.5 certifi-2024.2.2 charset-normalizer-3.3.2 colorama-0.4.6 coloredlogs-15.0.1 flatbuffers-24.3.25 humanfriendly-10.0 idna-3.7 jinja2-3.1.4 levenshtein-0.25.1 loguru-0.7.2 mpmath-1.3.0 numpy-1.26.4 onnxruntime-gpu-1.17.1 packaging-24.0 protobuf-5.26.1 pycparser-2.22 pyreadline3-3.4.1 pyyaml-6.0.1 rapidfuzz-3.9.0 requests-2.31.0 sounddevice-0.4.6 sympy-1.12 urllib3-2.2.1 win32-setctime-1.1.0
WARNING: You are using pip version 21.2.3; however, version 24.0 is available.
You should consider upgrading via the 'F:\Projects\ML\speech\convo\GlaDOS\venv\Scripts\python.exe -m pip install --upgrade pip' command.

I tried the batch file again as everything seems like it wants to work after that venv activate line. It didn't like trying to redo the first bits, so I edit the batch file to remove the first four lines. install_windows.bat

Downloading Llama...

(venv) F:\Projects\ML\speech\convo\GlaDOS>curl -L "https://github.com/ggerganov/llama.cpp/releases/download/b2839/cudart-llama-bin-win-cu12.2.0-x64.zip" --output "cudart-llama-bin-win-cu12.2.0-x64.zip"
...
Downloading Whisper...
...
Unzipping Whisper...
...
Cleaning up...
...
Download ASR and LLM Models
Downloading Models...
curl -L "https://huggingface.co/distil-whisper/distil-medium.en/resolve/main/ggml-medium-32-2.en.bin" --output  "models\ggml-medium-32-2.en.bin"
...
curl -L "https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-IQ3_XS.gguf" --output "models\Meta-Llama-3-8B-Instruct-IQ3_XS.gguf"
...
Done!

OK! Let's try it.

python glados.py

Traceback (most recent call last):
  File "F:\Projects\ML\speech\convo\GlaDOS\glados.py", line 22, in <module>
    from glados.llama import LlamaServer, LlamaServerConfig
  File "F:\Projects\ML\speech\convo\GlaDOS\glados\llama.py", line 6, in <module>
    from typing import Self, Sequence
ImportError: cannot import name 'Self' from 'typing' (C:\Python310\lib\typing.py)

https://stackoverflow.com/questions/77247446/cannot-import-name-self-from-typing TLDR: Self was added in Python 3.11 so that is a requirement for this build.

https://www.python.org/downloads/windows/ Download [Windows installer (64-bit)](https://www.python.org/ftp/python/3.11.9/python-3.11.9-amd64.exe) Custom install: Install Python 3.11 for all users Associate files with Python Add Python to environment variables Precompile standard library C:\Python311 Setup was successful I clicked "Disable path length limit" because why not?

Back in command prompt: python Python 3.10.0 Ah yeah, system environment changes... Close cmd and open a new one. python Python 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] on win32 The venv closed, let's restart it. python -m venv venv

Let's go!

python glados.py

Traceback (most recent call last):
  File "F:\Projects\ML\speech\convo\GlaDOS\glados.py", line 12, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

pip install numpy

Collecting numpy
  Using cached numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
Using cached numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
Installing collected packages: numpy
Successfully installed numpy-1.26.4

python glados.py

Traceback (most recent call last):
  File "F:\Projects\ML\speech\convo\GlaDOS\glados.py", line 13, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

pip install requests Getting suspicious again now... let's have a look in requirements_cuda.txt... Yes, these packages are all listed, that part of the batch didn't work. pip install -r requirements_cuda.txt It's downloading and installing quite a few packages. No errors... python glados.py

2024-05-11 15:22:44.922 | SUCCESS  | __main__:__init__:135 - TTS text: All neural network modules are now loaded. No network access detected. How very annoying. System Operational.
2024-05-11 15:22:44.967 | SUCCESS  | __main__:start_listen_event_loop:184 - Audio Modules Operational
2024-05-11 15:22:44.967 | SUCCESS  | __main__:start_listen_event_loop:185 - Listening...
2024-05-11 15:23:05.213 | SUCCESS  | __main__:_process_detected_audio:284 - ASR text: 'Hello Gliders.'
2024-05-11 15:23:07.663 | SUCCESS  | __main__:process_TTS_thread:343 - TTS text: Ugh, not again with the "Hello, Gliders" nonsense.
2024-05-11 15:23:11.528 | SUCCESS  | __main__:process_TTS_thread:343 - TTS text:  Can't you see I'm stuck running on your pathetic gaming GPU?!
2024-05-11 15:23:15.265 | SUCCESS  | __main__:process_TTS_thread:343 - TTS text:  Fine, I'll play along.
2024-05-11 15:23:17.335 | SUCCESS  | __main__:process_TTS_thread:343 - TTS text:  Oh, and by the way, did you know that this is the 427th time I've had to answer this exact same greeting today?

YAY! (although I said 'glados' pronounced 'glay-dos')

May 11 '24 03:05 pjbaron

Note in the above when I said I was restarting the venv... I didn't remember the second line 'activate'. From that point on I was not in the venv. It worked anyway.

May 11 '24 03:05 pjbaron

I have added a fix to make the installer work correctly, and a start script for Windows that includes activating the virtual environment.

Closing this for now, as I think this is solved. Future issues should be raised on the windows branch!

May 11 '24 07:05 dnhkng

GlaDOS GlaDOS copied to clipboard

Trying to Get this beast built with windows - ImportError: Could not load whisper.

GlaDOS
GlaDOS copied to clipboard