With it I've able to run GlaDOS on windows in ubuntu WSL2 terminal docker run -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v "/mnt/wslg/:/mnt/wslg/" --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 glados

Models could be baked in or mounted additionaly into /app/models

Summary by CodeRabbit

Chores
- Enhanced setup of the development environment for a Python application with a multi-stage build process in the Dockerfile.
- Updated requirements.docker.txt with new Python packages like onnxruntime-gpu, numpy, and more.
Documentation
- Added Windows-specific run instructions to the README.md for running the application in a Docker container using WSL2.

May 07 '24 11:05 umag

Walkthrough

The recent changes streamline the development and deployment process by introducing a multi-stage Docker build in the Dockerfile, providing detailed Windows-specific Docker instructions in the README.md, and expanding the requirements.docker.txt with additional dependencies.

Changes

File	Summary
`Dockerfile`	Introduces a multi-stage build process for setting up a development environment.
`README.md`	Added Windows-specific run instructions for Docker setup.
`requirements.docker.txt`	Updated with new Python packages like `onnxruntime-gpu`, `numpy`, and more.

🐰✨ In the code's gentle glow, a Dockerfile did grow, README whispers softly, guidance for WSL flow. New dependencies align, in requirements they shine, A hop, a skip, a leap, our project's quite divine! 🌟📜🐇

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

May 07 '24 11:05 coderabbitai[bot]

Thanks, this seems really valuable. I will test it in the next few days, but this will be a very useful addition to the code base!

May 07 '24 17:05 dnhkng

IMO for proper Dockerisation, better separation of concerns is needed in the codebase. With Docker you want to avoid monolithic applications (i.e. multiple pieces of software running in the same container), and rather you'd want multiple containers, each focused on a single task.

In case of this project, the best approach is to have separate containers for:

the STT engine, providing a bitstream input (for audio), and a text output API
the LLM engine, providing the standard llama.cpp API
the TTS engine, providing a text input API, and a bitstream output
the "main app", i.e. the framework that ties these three components together

You want to separate these because then each container can be individually configured. E.g. if a user does not want to, or cannot run the LLM locally, or wants to run things on different hosts.

May 09 '24 09:05 fonix232

In principle, I totally agree with a Microservice based system.

However, this application uses the sounddevice library in both the ASR and TTS sections of code. I think setting up these properly will increase the complexity a lot.

I see this docker solution as a simple way to get everything running in a single command, which has some nice benefits for non-technical users.

In the future, I see a real need for Microservices, for example a vectorDB for storing memories. But I would wait until we have a definite need before we start with that kind of engineering.

At the moment, you can already fire up any LLM with a OpenAI compatible API, so that's enough compartmentalization for now.

May 09 '24 10:05 dnhkng

This is the error i get! Can somemeone please make this work in docker with simple command for us users who don't know anaything about programing please i want to talk to AI.... docker run -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v "/mnt/wslg/:/mnt/wslg/" --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 glados Unable to find image 'glados:latest' locally docker: Error response from daemon: pull access denied for glados, repository does not exist or may require 'docker login': denied: requested access to the resource is denied. See 'docker run --help'.

May 09 '24 11:05 SubPhaser

The docker image needs to be built. I'm trying with: docker build -t glados .

but I'm currently stuck on the line: => [base-l 4/4] RUN cd /app && make server LLAMA_CUDA=1 CUDA_DOCKER_ARCH=all

It's taking forever to compile, I cancelled after 10 minutes.

May 09 '24 12:05 dnhkng

Yep it takes time to compile, you can set your gpu arch in order to compile necessary bits

On Thu, May 9, 2024, 14:03 David @.***> wrote:

The docker image needs to be built. I'm trying with: docker build -t glados .

but I'm currently stuck on the line: => [base-l 4/4] RUN cd /app && make server LLAMA_CUDA=1 CUDA_DOCKER_ARCH=all

It's taking forever to compile, I cancelled after 10 minutes.

— Reply to this email directly, view it on GitHub https://github.com/dnhkng/GlaDOS/pull/29#issuecomment-2102531662, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARKCWK3M5YKHFH2E5SKKD3ZBNQXZAVCNFSM6AAAAABHK2LFAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSGUZTCNRWGI . You are receiving this because you authored the thread.Message ID: @.***>

May 09 '24 12:05 umag

So the project ran locally at first, but whisper kept going to CPU instead of GPU, slowing everything down a lot. Trying to fix this with the usual nvidia cuda/cudnn mess just made things worse and now for some reason llama server wont start, though oobabooga, comfort-ui etc all run fine. So, went with the dockerfile so kindly provided, and though docker was created successfully, running results in:

========== == CUDA ==

CUDA Version 11.7.1

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Traceback (most recent call last): File "/app/glados.py", line 21, in from glados import asr, tts, vad File "/app/glados/asr.py", line 5, in from . import whisper_cpp_wrapper File "/app/glados/whisper_cpp_wrapper.py", line 861, in _libs["whisper"] = load_library("whisper") ^^^^^^^^^^^^^^^^^^^^^^^ File "/app/glados/whisper_cpp_wrapper.py", line 547, in call raise ImportError("Could not load %s." % libname) ImportError: Could not load whisper.

help?

May 09 '24 15:05 bitbyteboom

To build container run git submodule update --init --recursive put models in models dir run docker build -t glados . wait it will take time to compile cuda kernels for all gpus, or if you know your gpu arch set CUDA_DOCKER_ARCH variable accordingly, after that run with command in first post

May 09 '24 15:05 umag

Thank you for that, but although it is running now, locally or in docker this line is the bane:

/usr/local/lib/python3.11/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'

so it keeps going back to CPU, causing very slow running. Llama runs nice and fast on gpu meanwhile

May 09 '24 15:05 bitbyteboom

@bitbyteboom looks like fix is to change onnxruntime to onnxruntime-gpu in requirements.txt

May 09 '24 16:05 umag

Yes, but this will break Mac installs.

Still working on seperation it all out!

Maybe we pip install the needed packages in the Docker file, and ignore the requirements.txt

May 09 '24 16:05 dnhkng

@dnhkng Possible solution is fork requirements.txt and make a version for docker and for mac, and leave default one for linux ie requirements.docker.txt requirements.mac.txt

May 09 '24 17:05 umag

OK, let's do that. Let's make a Docker requirements using the onnxruntime-gpu, and for the Dockerfile, assume CUDA.

That will cover a majority of Windows users. Anyone with another GPU can use that as a starting point. Please add in documentation for the Docker install to the Readme, and we'll merge this in.

Please add in the links to CUDA and Docker installation instructions, or we'll have hundreds of issues come up asking for help.

May 09 '24 17:05 dnhkng

In the dockerfile, could we also make just the server in llama.cpp.

May 09 '24 18:05 dnhkng

@bitbyteboom looks like fix is to change onnxruntime to onnxruntime-gpu in requirements.txt

Yes, saw that and already had onnxruntime-gpu in requirements.txt. Could there be an issue that for the libwhisper.so, I did "WHISPER_CUDA=1 make libwhisper.so -j"? Had thought that would be what's needed rather than just "make libwhisper.so" in the instructions.

Tried it both ways but still getting the ".local/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'"

Don't get me wrong, it's working overall and it's a very fun project to tinker with. Hope to have something positive to add in the future. I'm trying to turn this into a sassy coach for Olympiad subjects for our 7 year old son who is in love with Portal.

May 09 '24 18:05 bitbyteboom

Should be unrelated. The onnxruntime is used in TTS, not by the whisper model.

Is that error message from docker?

May 09 '24 18:05 dnhkng

I think I've mixed the two up. Just starting fresh and going with running local for linux and will try docker in morning for windows.

May 09 '24 19:05 bitbyteboom

@dnhkng

However, this application uses the sounddevice library in both the ASR and TTS sections of code. I think setting up these properly will increase the complexity a lot.

Not necessarily, if the same base image is used for both the ASR and TTS containers. Ideally you'd just physically attach the appropriate microphone(s) and speaker(s) to the right container, ensuring that PulseAudio/ALSA only has a single device/card to select (also probably need to disable the dummy configs).

I see this docker solution as a simple way to get everything running in a single command, which has some nice benefits for non-technical users.

True, but as you can see it's already causing issues to people who have little to no knowledge on how to use Docker. It's a real conundrum to resolve this, because on one hand you do not want to alienate possible users (there's plethora examples in the open source scene of arsehole devs driving away people, like the Valetudo project), but on the other hand, you are not a personal helpline to teach people the basics... Especially when running completely foreign code blindly.

On the other hand, a well crafted multi-container approach with a Docker Compose file provided could work wonders, and would be much more useful than a monolithic container.

In the future, I see a real need for Microservices, for example a vectorDB for storing memories. But I would wait until we have a definite need before we start with that kind of engineering.

IMO that's the wrong approach - a definitive architecture needs to be in place before we start tacking on features.

May 10 '24 08:05 fonix232

Any other requsts in order to make that mergable ? Or it now not needed due to win install scriptt

May 12 '24 11:05 umag

Nope, happy to merge, but I had some issues building the Docker. Once I know it works reliably, I look forward merging.

May 12 '24 12:05 dnhkng

can git submodule update --init --recursive go into the docker?

May 13 '24 07:05 dnhkng

Just tried the Dockerfile build, and although it works, the TTS inference is super slow. 2024-05-13 09:15:26.774802449 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.

It seems to be running on CPU, and takes several seconds, instead of a few hundred milliseconds. Did you hit this issues? I'm on WIndows 11, NVIDIA-SMI 546.01 Driver Version: 546.01 CUDA Version: 12.3

May 13 '24 09:05 dnhkng

can git submodule update --init --recursive go into the docker?

Better not to, it will be hard to cache

May 13 '24 09:05 umag

Just tried the Dockerfile build, and although it works, the TTS inference is super slow. 2024-05-13 09:15:26.774802449 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.

It seems to be running on CPU, and takes several seconds, instead of a few hundred milliseconds. Did you hit this issues? I'm on WIndows 11, NVIDIA-SMI 546.01 Driver Version: 546.01 CUDA Version: 12.3

It worked for me without issues, I can help you debug that in discord, TTS slowness was present when there were no onnixruntime-gpu

May 13 '24 09:05 umag

OK, lets merge and let people at least test it. I think you need to merge in main again, to align with the current config and espeak binaries etc. as the dockerfile is 26 commits behind main.

May 13 '24 17:05 dnhkng

OK, lets merge and let people at least test it. I think you need to merge in main again, to align with the current config and espeak binaries etc. as the dockerfile is 26 commits behind main.

Done

May 13 '24 18:05 umag

One last change, as the requirements for the docker are the same as for the standard CUDA system, lets reuse the requirements_cuda.txt file for the docker, and not create a new and identical requirements.docker.txt file.

May 13 '24 18:05 dnhkng

Initial dockerfile

Summary by CodeRabbit

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

========== == CUDA ==

CodeRabbit Configration File (`.coderabbit.yaml`)