distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

Docker Image for running distilabel CLI

Open ignacioct opened this issue 9 months ago • 2 comments

Closes #608

I've implemented the two images for running Distilabel, one that builds from runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04 and is able of using CUDA and one more constrained from python:3.11-slim. To try them out:

docker build --tag distilabel_cuda --file docker/CUDA.Dockerfile . --no-cache
docker run --rm distilabel_cuda distilabel pipeline run --config "https://huggingface.co/datasets/distilabel-internal-testing/test-dockerfile-2/raw/main/pipeline.yaml"
docker build --tag distilabel_local --file docker/local.Dockerfile . --no-cache
docker run --rm distilabel_local distilabel pipeline run --config "https://huggingface.co/datasets/distilabel-internal-testing/test-dockerfile-2/raw/main/pipeline.yaml"

I was unsure of which dependencies to include in the local image. Do you have any ideas @plaguss @gabrielmbmb ?

ignacioct avatar May 07 '24 11:05 ignacioct

As suggested by @alvarobartt we should work with the nvidia base images. I'll copy here the comments:

  • This should be enough to get started:
FROM nvidia/cuda:12.3.0-base-ubuntu22.04 AS build

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install python3 python3-pip -y

RUN ln -s /usr/bin/python3 /usr/bin/python
ENV PYTHON=/usr/bin/python

ARG TORCH="2.2.0"

RUN python -m pip install --no-cache-dir --upgrade pip && \
    python -m pip install --no-cache-dir torch==${TORCH}
  • Additionally, you could also add some BUILD_ARGS for both the CUDA and Ubuntu versions, as well as distilabel itself, so that we can use that Dockerfile to build Docker images for multiple CUDA versions (ideally only 12.3 and 11.8 should be needed)

  • Plus something else we'll need to take into consideration is that the image may be used from Linux distributions or Windows (not sure if Windows requires some flags to be set in order to properly identify the GPU and such, but maybe worth double checking)

plaguss avatar May 14 '24 10:05 plaguss

@plaguss should be good to go. As I talked with @alvarobartt yesterday, we are sticking to the runpod image for now. I added the build arguments, and as far as the research I did, there should be no problem on Windows. If it arises, happy to adapt it with more feedback.

ignacioct avatar May 17 '24 10:05 ignacioct