exo icon indicating copy to clipboard operation
exo copied to clipboard

Docker Image

Open Scot-Survivor opened this issue 1 year ago • 35 comments

Added some commits, and double-backed off of DanCodes, the original pull request that was closed.

Scot-Survivor avatar Aug 24 '24 18:08 Scot-Survivor

This will likely need a better cleanup from people who know how to write Docker files better than I do.

But I believe this is a good starting point.

Scot-Survivor avatar Aug 24 '24 18:08 Scot-Survivor

This looks great! Thanks for the contribution (also fixes #119)

Some small things:

  • (EDIT - just saw you already have this) EXPOSE in Dockerfile for the ports that are exposed (it’s only for documentation has no effect)
  • one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)
  • What’s the thinking with continuous delivery? Official exo docker images on dockerhub?
  • It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly
  • Related to above: if we can run a multi-node test in CI that would be super

AlexCheema avatar Aug 24 '24 18:08 AlexCheema

It may be wise to provide 2 separate dockerfiles, as not all devices run NVIDIA GPU's, however I have not looked a lot at the source code, I assume Cuda isn't a fixed requirement?

Axodouble avatar Aug 26 '24 06:08 Axodouble

Heya, I'll be checking this out today, some context with the original PR is that I was just merging it to main in our fork late at night so I messed up the target howwweeeever, I'm glad to see it would be helpful here. Firstly I'll rebase this to resolve conflicts I'm seeing. As for your comments:

  • one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)

I agree here, that's probably the best way to move forward, would you prefer it in say a docker/ folder or just at root? Personally I try to limit files at root but obviously if you have a preference I'll follow that.

  • What’s the thinking with continuous delivery? Official exo docker images on dockerhub?

Yep I can add a CD github action to this PR, just up to you guys to create an org and add the token to the repo action secrets.

  • It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly

Great idea, this could also go in the aforementioned docker folder

  • Related to above: if we can run a multi-node test in CI that would be super

Up to you if you think this is in scope for this PR, I think possibly it's a nice-to-have so maybe for a future feature

dan-online avatar Aug 28 '24 12:08 dan-online

Heya, I'll be checking this out today, some context with the original PR is that I was just merging it to main in our fork late at night so I messed up the target howwweeeever, I'm glad to see it would be helpful here. Firstly I'll rebase this to resolve conflicts I'm seeing. As for your comments:

  • one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)

I agree here, that's probably the best way to move forward, would you prefer it in say a docker/ folder or just at root? Personally I try to limit files at root but obviously if you have a preference I'll follow that.

At the root is fine.

  • What’s the thinking with continuous delivery? Official exo docker images on dockerhub?

Yep I can add a CD github action to this PR, just up to you guys to create an org and add the token to the repo action secrets.

We can create an org. Someone has already taken exolabs unfortunately, so I've requested to claim that name.

  • It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly

Great idea, this could also go in the aforementioned docker folder

:)

  • Related to above: if we can run a multi-node test in CI that would be super

Up to you if you think this is in scope for this PR, I think possibly it's a nice-to-have so maybe for a future feature

Let's leave it to a future PR then. For now, the docker-compose.yml can serve as documentation / quick test locally.

AlexCheema avatar Aug 28 '24 13:08 AlexCheema

Does exo not use all available GPU's to the pc by default?

Why would someone want multi workers in a compose, compose only works with one host, it's not multi node orchestrated like Kubernetes

Scot-Survivor avatar Aug 28 '24 14:08 Scot-Survivor

Does exo not use all available GPU's to the pc by default?

Why would someone want multi workers in a compose, compose only works with one host, it's not multi node orchestrated like Kubernetes

exo does not use multi-gpu by default. If you have a single device with multiple GPUs you can (e.g. with the tinygrad backend) set VISIBLE_DEVICES={index} where {index} starts from 0 e.g. VISIBLE_DEVICES=1 for index 1. Or specifically for CUDA, this would be CUDA_VISIBLE_DEVICES={index}

AlexCheema avatar Aug 28 '24 15:08 AlexCheema

@AlexCheema Feel free to review!

dan-online avatar Aug 28 '24 16:08 dan-online

@dan-online , at least one other Dockerfile for none GPU accelerated computers would be useful (and to us)

Scot-Survivor avatar Aug 28 '24 16:08 Scot-Survivor

Did Alpine work? Ubuntu is massive. Python Alpine Base image should work?

Scot-Survivor avatar Aug 28 '24 16:08 Scot-Survivor

Alpine was- tricky so I pushed an ubuntu image first just to check if it would work before I try tackling alpine again

dan-online avatar Aug 28 '24 16:08 dan-online

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

dan-online avatar Aug 28 '24 17:08 dan-online

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

we shouldn't have a tensorflow dependency. when I run pip list tensorflow does not come up. why do we need tensorflow?

AlexCheema avatar Aug 30 '24 11:08 AlexCheema

Secured the exolabs dockerhub namespace now!

AlexCheema avatar Aug 30 '24 13:08 AlexCheema

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

we shouldn't have a tensorflow dependency. when I run pip list tensorflow does not come up. why do we need tensorflow?

@dan-online you got a chance to follow up today?

Scot-Survivor avatar Sep 03 '24 15:09 Scot-Survivor

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:

tensorflow request

dan-online avatar Sep 05 '24 17:09 dan-online

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:

tensorflow request

This looks fine. That warning can be ignored, it comes from the transformers library but we don't use models from there.

AlexCheema avatar Sep 24 '24 12:09 AlexCheema

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot: tensorflow request

This looks fine. That warning can be ignored, it comes from the transformers library but we don't use models from there.

Weirdly it didn't actually boot anything without tensorflow installed, it would just stop at that warning

dan-online avatar Sep 25 '24 15:09 dan-online

Hey @AlexCheema any info here?

dan-online avatar Oct 06 '24 20:10 dan-online

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot: tensorflow request

This looks fine. That warning can be ignored, it comes from the transformers library but we don't use models from there.

Weirdly it didn't actually boot anything without tensorflow installed, it would just stop at that warning

That is weird. Either way, I'm removing the dependency on transformers. Issue here: #169

AlexCheema avatar Oct 06 '24 21:10 AlexCheema

If you installed transformers explicitly was there any other issue?

AlexCheema avatar Oct 06 '24 21:10 AlexCheema

Hey @AlexCheema let me know any changes needed!

dan-online avatar Oct 28 '24 14:10 dan-online

Any updates on the acceptance of this PR?

Scot-Survivor avatar Nov 10 '24 16:11 Scot-Survivor

ser @Scot-Survivor great work in this PR, Have couple of questions

  • What is the advantage of using nvidia/cuda:12.5.1-cudnn-runtime-ubuntu22.04 as a base?
  • What if we use python:3.12 as a base image? It requires less effort for setting up the Python environment. Putting a Dockerfile inspired by your PR
# syntax=docker/dockerfile:1
FROM python:3.12

ENV WORKING_PORT=8080
ENV DEBUG=1
ENV DEBIAN_FRONTEND=noninteractive

SHELL ["/bin/bash", "-cx"]

COPY . .
RUN <<EOF
apt-get update -y
apt-get install --no-install-recommends -y git gnupg build-essential software-properties-common curl libgl1 uuid-runtime clang
pip install --no-cache-dir -e .
pip install mlx llvmlite
pip3 cache purge
if [ -z "$NODE_ID" ]; then export NODE_ID=$(uuidgen); fi
EOF
ENTRYPOINT ["python3"]
CMD ["./exo/main.py", "--node-host", "0.0.0.0", "--disable-tui", "--node-id", "$NODE_ID"]

pratikbin avatar Jan 14 '25 06:01 pratikbin

Is the Nvidia base not required for Nvidia GPU support on desktop?

Scot-Survivor avatar Jan 14 '25 06:01 Scot-Survivor

This was what I was about to mention. For some reason, my container is not detecting NVIDIA GPUs even though I set runtime as GPU. I don't think this is because of not having a base NVIDIA image, but again, I’m not experienced with the container with GPU. Need your expert opinion.

pratikbin avatar Jan 14 '25 06:01 pratikbin

@AlexCheema any plan to merge this?

Scot-Survivor avatar Jan 19 '25 19:01 Scot-Survivor

I have an existing Ollama k8s setup using ollama-helm. Including CPU and GPU support.

I would be very happy to test this and create the helm chart to scale it on a cluster.

I have a cluster with many 1060 GTX 6 GB. Like a dozen of those. It would be awesome to get a 60+GB model running on that.

JMLX42 avatar Feb 05 '25 08:02 JMLX42

hey thanks @JMLX42 , if you can build this container image and test it. or i can build it if you want

https://github.com/exo-explore/exo/pull/173#issuecomment-2589117050

pratikbin avatar Feb 07 '25 14:02 pratikbin

This was what I was about to mention. For some reason, my container is not detecting NVIDIA GPUs even though I set runtime as GPU. I don't think this is because of not having a base NVIDIA image, but again, I’m not experienced with the container with GPU. Need your expert opinion.

All I'm able to say here is that it works on my machine I'm afraid, I only have the one GPU node to test on.

Scot-Survivor avatar Feb 07 '25 15:02 Scot-Survivor