OpenChatKit Build a docker image for openchatkit

Is your feature request related to a problem? Please describe. A docker image might be easier for people to use.

Describe the solution you'd like We could add a /docker folder or a simple dockerfile to the repo, so people could build the image by themselves. And maybe we could push the image to dockerhub so they could just pull and test.

Mar 14 '23 12:03 loklok-infi

Thanks for the feature request. This is a great idea. Will put it on the roadmap.

Mar 14 '23 18:03 csris

Hey @Jonuknownothingsnow i am new to open source . and it would be very happy to work on this idea under your guidance

Mar 14 '23 19:03 rpj09

Dockerfile

# Base image
FROM ubuntu:20.04

# Set working directory
WORKDIR /app

# Update and install required packages
RUN apt-get update && \
    apt-get install -y git-lfs wget && \
    rm -rf /var/lib/apt/lists/*

# Download and install Miniconda
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    rm Miniconda3-latest-Linux-x86_64.sh

# Set conda to automatically activate base environment on login
RUN echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
    echo "conda activate base" >> ~/.bashrc

# Create OpenChatKit environment
COPY environment.yml .
RUN conda env create -f environment.yml

# Install Git LFS
RUN git lfs install

# Copy OpenChatKit code
COPY . .

# Prepare GPT-NeoX-20B model
RUN python pretrained/GPT-NeoX-20B/prepare.py

# Set entrypoint to bash shell
ENTRYPOINT ["/bin/bash"]

Build the Docker image using the following command:

docker build -t openchatkit .

Run the Docker container using the following command:

docker run -it openchatkit

This will start a new bash shell in the container. Activate the OpenChatKit environment by running the following command:

conda activate OpenChatKit

You should now be able to use the OpenChatKit code and run the prepare.py script.

Mar 15 '23 11:03 kailust

As I mentioned in the PR, both the pretrained model and datasets can be quite large.

$ du -sh data/* pretrained/GPT-NeoX-20B/
172G    data/OIG
238M    data/OIG-moderation
38G     data/wikipedia-3sentence-level-retrieval-index
39G     pretrained/GPT-NeoX-20B/

The Dockerfile above bakes the 39GB pretrained model into the image. In my opinion, it would be better to download the pretrained model into a bind mount when the container starts. The image would be much smaller and the bind mount persists the model across container restarts.

Mar 18 '23 05:03 csris

As I mentioned in the PR, both the pretrained model and datasets can be quite large.
$ du -sh data/* pretrained/GPT-NeoX-20B/
172G    data/OIG
238M    data/OIG-moderation
38G     data/wikipedia-3sentence-level-retrieval-index
39G     pretrained/GPT-NeoX-20B/
The Dockerfile above bakes the 39GB pretrained model into the image. In my opinion, it would be better to download the pretrained model into a bind mount when the container starts. The image would be much smaller and the bind mount persists the model across container restarts.

Sure i will try to do it

Mar 19 '23 13:03 rpj09

Hello,

I'm just starting with OpenChat, I was checking to use Docker and I found your Dockerfile.

One question, for training the model, I understand cuda + nvidia are used if available?

If that yes, yesterday I found this, maybe useful https://blog.roboflow.com/nvidia-docker-vscode-pytorch/ Looks like there are docker-nvidia accelerated containers nvidia/cuda:11.0.3-base-ubuntu20.04 ready to be used. (Note: Because I use docker-compose I had to update to a 1.28.0+ version to configure of '--gpus all' parameter)

Thank you

Apr 22 '23 01:04 xsanz

Hello there,

Just mention that I had issues building the environment with package netifaces, that I solved updating environment.yml file from netifaces===0.11.0 to netifaces2==0.0.16

Thank you

Apr 23 '23 04:04 xsanz

Hello there,

Just mention that I had issues building the environment with package netifaces, that I solved updating environment.yml file from netifaces===0.11.0 to netifaces2==0.0.16

Thank you

This can be fixed by installing gcc. On Ubuntu, you'd run sudo apt install gcc. That should fix your error!

Apr 23 '23 04:04 orangetin

If that yes, yesterday I found this, maybe useful https://blog.roboflow.com/nvidia-docker-vscode-pytorch/ Looks like there are docker-nvidia accelerated containers nvidia/cuda:11.0.3-base-ubuntu20.04 ready to be used. (Note: Because I use docker-compose I had to update to a 1.28.0+ version to configure of '--gpus all' parameter)

Thanks for the great resource @xsanz! I was able to get the model loaded onto the GPU in docker using those instructions

Apr 25 '23 00:04 orangetin

the conda binary was not found during my docker build. If anyone runs into this issue, I had set the PATH correctly before trying to run conda

ENV PATH=/opt/conda/bin/:$PATH

May 06 '23 09:05 mlaug

OpenChatKit OpenChatKit copied to clipboard

Build a docker image for openchatkit

OpenChatKit
OpenChatKit copied to clipboard