GlaDOS
GlaDOS copied to clipboard
Initial dockerfile
With it I've able to run GlaDOS on windows in ubuntu WSL2 terminal docker run -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v "/mnt/wslg/:/mnt/wslg/" --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 glados
Models could be baked in or mounted additionaly into /app/models
Summary by CodeRabbit
- Chores
- Enhanced setup of the development environment for a Python application with a multi-stage build process in the Dockerfile.
- Updated
requirements.docker.txtwith new Python packages likeonnxruntime-gpu,numpy, and more.
- Documentation
- Added Windows-specific run instructions to the
README.mdfor running the application in a Docker container using WSL2.
- Added Windows-specific run instructions to the
Walkthrough
The recent changes streamline the development and deployment process by introducing a multi-stage Docker build in the Dockerfile, providing detailed Windows-specific Docker instructions in the README.md, and expanding the requirements.docker.txt with additional dependencies.
Changes
| File | Summary |
|---|---|
Dockerfile |
Introduces a multi-stage build process for setting up a development environment. |
README.md |
Added Windows-specific run instructions for Docker setup. |
requirements.docker.txt |
Updated with new Python packages like onnxruntime-gpu, numpy, and more. |
🐰✨ In the code's gentle glow, a Dockerfile did grow, README whispers softly, guidance for WSL flow. New dependencies align, in
requirementsthey shine, A hop, a skip, a leap, our project's quite divine! 🌟📜🐇
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>.Generate unit testing code for this file.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai generate unit testing code for this file.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai generate interesting stats about this repository and render them as a table.@coderabbitai show all the console.log statements in this repository.@coderabbitai read src/utils.ts and generate unit testing code.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (invoked as PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger a review. This is useful when automatic reviews are disabled for the repository.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai helpto get help.
Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
CodeRabbit Configration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
Thanks, this seems really valuable. I will test it in the next few days, but this will be a very useful addition to the code base!
IMO for proper Dockerisation, better separation of concerns is needed in the codebase. With Docker you want to avoid monolithic applications (i.e. multiple pieces of software running in the same container), and rather you'd want multiple containers, each focused on a single task.
In case of this project, the best approach is to have separate containers for:
- the STT engine, providing a bitstream input (for audio), and a text output API
- the LLM engine, providing the standard llama.cpp API
- the TTS engine, providing a text input API, and a bitstream output
- the "main app", i.e. the framework that ties these three components together
You want to separate these because then each container can be individually configured. E.g. if a user does not want to, or cannot run the LLM locally, or wants to run things on different hosts.
In principle, I totally agree with a Microservice based system.
However, this application uses the sounddevice library in both the ASR and TTS sections of code. I think setting up these properly will increase the complexity a lot.
I see this docker solution as a simple way to get everything running in a single command, which has some nice benefits for non-technical users.
In the future, I see a real need for Microservices, for example a vectorDB for storing memories. But I would wait until we have a definite need before we start with that kind of engineering.
At the moment, you can already fire up any LLM with a OpenAI compatible API, so that's enough compartmentalization for now.
This is the error i get! Can somemeone please make this work in docker with simple command for us users who don't know anaything about programing please i want to talk to AI.... docker run -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v "/mnt/wslg/:/mnt/wslg/" --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 glados Unable to find image 'glados:latest' locally docker: Error response from daemon: pull access denied for glados, repository does not exist or may require 'docker login': denied: requested access to the resource is denied. See 'docker run --help'.
The docker image needs to be built. I'm trying with:
docker build -t glados .
but I'm currently stuck on the line:
=> [base-l 4/4] RUN cd /app && make server LLAMA_CUDA=1 CUDA_DOCKER_ARCH=all
It's taking forever to compile, I cancelled after 10 minutes.
Yep it takes time to compile, you can set your gpu arch in order to compile necessary bits
On Thu, May 9, 2024, 14:03 David @.***> wrote:
The docker image needs to be built. I'm trying with: docker build -t glados .
but I'm currently stuck on the line: => [base-l 4/4] RUN cd /app && make server LLAMA_CUDA=1 CUDA_DOCKER_ARCH=all
It's taking forever to compile, I cancelled after 10 minutes.
— Reply to this email directly, view it on GitHub https://github.com/dnhkng/GlaDOS/pull/29#issuecomment-2102531662, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARKCWK3M5YKHFH2E5SKKD3ZBNQXZAVCNFSM6AAAAABHK2LFAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSGUZTCNRWGI . You are receiving this because you authored the thread.Message ID: @.***>
So the project ran locally at first, but whisper kept going to CPU instead of GPU, slowing everything down a lot. Trying to fix this with the usual nvidia cuda/cudnn mess just made things worse and now for some reason llama server wont start, though oobabooga, comfort-ui etc all run fine. So, went with the dockerfile so kindly provided, and though docker was created successfully, running results in:
========== == CUDA ==
CUDA Version 11.7.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Traceback (most recent call last):
File "/app/glados.py", line 21, in
help?
To build container run git submodule update --init --recursive put models in models dir run docker build -t glados . wait it will take time to compile cuda kernels for all gpus, or if you know your gpu arch set CUDA_DOCKER_ARCH variable accordingly, after that run with command in first post
Thank you for that, but although it is running now, locally or in docker this line is the bane:
/usr/local/lib/python3.11/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
so it keeps going back to CPU, causing very slow running. Llama runs nice and fast on gpu meanwhile
@bitbyteboom looks like fix is to change onnxruntime to onnxruntime-gpu in requirements.txt
Yes, but this will break Mac installs.
Still working on seperation it all out!
Maybe we pip install the needed packages in the Docker file, and ignore the requirements.txt
@dnhkng Possible solution is fork requirements.txt and make a version for docker and for mac, and leave default one for linux ie requirements.docker.txt requirements.mac.txt
OK, let's do that. Let's make a Docker requirements using the onnxruntime-gpu, and for the Dockerfile, assume CUDA.
That will cover a majority of Windows users. Anyone with another GPU can use that as a starting point. Please add in documentation for the Docker install to the Readme, and we'll merge this in.
Please add in the links to CUDA and Docker installation instructions, or we'll have hundreds of issues come up asking for help.
In the dockerfile, could we also make just the server in llama.cpp.
@bitbyteboom looks like fix is to change onnxruntime to onnxruntime-gpu in requirements.txt
Yes, saw that and already had onnxruntime-gpu in requirements.txt. Could there be an issue that for the libwhisper.so, I did "WHISPER_CUDA=1 make libwhisper.so -j"? Had thought that would be what's needed rather than just "make libwhisper.so" in the instructions.
Tried it both ways but still getting the ".local/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'"
Don't get me wrong, it's working overall and it's a very fun project to tinker with. Hope to have something positive to add in the future. I'm trying to turn this into a sassy coach for Olympiad subjects for our 7 year old son who is in love with Portal.
Should be unrelated. The onnxruntime is used in TTS, not by the whisper model.
Is that error message from docker?
I think I've mixed the two up. Just starting fresh and going with running local for linux and will try docker in morning for windows.
@dnhkng
However, this application uses the sounddevice library in both the ASR and TTS sections of code. I think setting up these properly will increase the complexity a lot.
Not necessarily, if the same base image is used for both the ASR and TTS containers. Ideally you'd just physically attach the appropriate microphone(s) and speaker(s) to the right container, ensuring that PulseAudio/ALSA only has a single device/card to select (also probably need to disable the dummy configs).
I see this docker solution as a simple way to get everything running in a single command, which has some nice benefits for non-technical users.
True, but as you can see it's already causing issues to people who have little to no knowledge on how to use Docker. It's a real conundrum to resolve this, because on one hand you do not want to alienate possible users (there's plethora examples in the open source scene of arsehole devs driving away people, like the Valetudo project), but on the other hand, you are not a personal helpline to teach people the basics... Especially when running completely foreign code blindly.
On the other hand, a well crafted multi-container approach with a Docker Compose file provided could work wonders, and would be much more useful than a monolithic container.
In the future, I see a real need for Microservices, for example a vectorDB for storing memories. But I would wait until we have a definite need before we start with that kind of engineering.
IMO that's the wrong approach - a definitive architecture needs to be in place before we start tacking on features.
Any other requsts in order to make that mergable ? Or it now not needed due to win install scriptt
Nope, happy to merge, but I had some issues building the Docker. Once I know it works reliably, I look forward merging.
can git submodule update --init --recursive go into the docker?
Just tried the Dockerfile build, and although it works, the TTS inference is super slow.
2024-05-13 09:15:26.774802449 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
It seems to be running on CPU, and takes several seconds, instead of a few hundred milliseconds. Did you hit this issues? I'm on WIndows 11, NVIDIA-SMI 546.01 Driver Version: 546.01 CUDA Version: 12.3
can
git submodule update --init --recursivego into the docker?
Better not to, it will be hard to cache
Just tried the Dockerfile build, and although it works, the TTS inference is super slow.
2024-05-13 09:15:26.774802449 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.It seems to be running on CPU, and takes several seconds, instead of a few hundred milliseconds. Did you hit this issues? I'm on WIndows 11, NVIDIA-SMI 546.01 Driver Version: 546.01 CUDA Version: 12.3
It worked for me without issues, I can help you debug that in discord, TTS slowness was present when there were no onnixruntime-gpu
OK, lets merge and let people at least test it. I think you need to merge in main again, to align with the current config and espeak binaries etc. as the dockerfile is 26 commits behind main.
OK, lets merge and let people at least test it. I think you need to merge in main again, to align with the current config and espeak binaries etc. as the dockerfile is 26 commits behind main.
Done
One last change, as the requirements for the docker are the same as for the standard CUDA system, lets reuse the requirements_cuda.txt file for the docker, and not create a new and identical requirements.docker.txt file.