clipper
clipper copied to clipboard
Support deploying models with GPU access
For Kubernetes, we can use the experimental GPU support feature: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
For Docker, we can use nvidia-docker.
@dcrankshaw i have done some work on this ...i would like to pick this up if its fine with you guys
Sure that would be great. Have you worked with the Kubernetes GPU support? Go ahead and assign the issue to yourself.
I've been paying rather close attention to this issue, so I'm just wondering if there's been any behind-the-scenes movement on it? Feels like a major value add for clipper.
I just implemented this. It still needs a bit of testing, but I should have a PR up by the end of the week.
On Sat, Mar 10, 2018 at 7:17 PM, Luc Gendrot [email protected] wrote:
I've been paying rather close attention to this issue, so I'm just wondering if there's been any behind-the-scenes movement on it? Feels like a major value add for clipper.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ucbrise/clipper/issues/338#issuecomment-372086125, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaV5JDwcIqyhK4qhVVGZUjizvNpdNVtks5tdJdAgaJpZM4Qvhcd .
Hi @dcrankshaw, I have tried by using nvidia-docker. I have installed nvidia-docker package in my local machine and start the docker where model servers using nvidia-docker container_id to access the gpu resources from the machine. But the model-server doesn't get gpu access.
For the latest nvidia-docker I believe you need to pass in runtime=“nvidia”
in docker.containers.run
After weeks of researching and trial-and-error, I have finally got the Clipper to work with GPU and TensorFlow. I think it is worth to share my little experience with those we are also looking at this issue. I would try my very best to make the steps clear and concise, as summarized as follows:
1. In order to allow the GPU support for Clipper, you would first need to install nvidia docker Detailed steps you are advised to refer to: https://github.com/NVIDIA/nvidia-docker
2. Build your own nvidia docker image, which will be served as a base image when you build and deploy your clipper. I have referred to the following:
◦ https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.0/base/Dockerfile
◦ https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.0/runtime/Dockerfile
and construct my own docker file to build a nvidia docker image with cuda runtime. Please do not mind to overwrite the default PATH and LD_LIBRARY_PATH which I observed that they were not pointing to the right folders. Instead, use the following values in case:
ENV PATH /usr/local/cuda:${PATH} ENV LD_LIBRARY_PATH /usr/local/cuda/lib64
Also, you are expected to install the following packages:
• Python3
• python3-pip
• libzmq5
• redis-server
• libsodium18
• build-essential
and also the python packages:
• cloudpickle
• pyzmq
• prometheus_client
• pyyaml
• jsonschema
• redis
• psutil
• flask
• numpy
3. Next, please ensure you have also installed the cuDNN (please refer to: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html). In my case, as I already have those required files in my host machine, what I need to do is just to copy the files over to the docker image.
4. Make a first-tier directory in your docker image, and name it as container
5. Copy the following files from your host to the /container in docker image:
**COPY containers/python/__init__.py containers/python/tf_container.py containers/python/container_entry.sh containers/python/rpc.py /container/
COPY monitoring/metrics_config.yaml /container/**
In case you doubt where to get those files, here is the link: https://github.com/ucbrise/clipper/tree/develop/containers/python Next, make some minor revision to rpc.py at line 757: From: cmd = ['python', '-m', 'clipper_admin.metrics.server'] To: cmd = ['python3', '-m', 'clipper_admin.metrics.server']
6. Upgrade pip3 to the newer version:
RUN pip3 install --upgrade pip
7. Install tensorflow-gpu and clipper_admin
8. Set the following:
**ENV CLIPPER_MODEL_PATH=/model
CMD ["/container/container_entry.sh", "tensorflow-container", "/container/tf_container.py"] HEALTHCHECK --interval=3s --timeout=3s --retries=1 CMD test -f /model_is_ready.check || exit 1**
Note: the HEALTHCHECK statement is important as the clipper_admin would need such information when starting your model.
9. Modify the /etc/docker/daemon.json by adding the following entry:
**"default-runtime":"nvidia",**
and then restart the docker service to make the above configuration effective.
Now, you are ready to kick-start the Clipper with GPU support. Hope that the aforementioned steps are useful to you all. I have also provided a docker template here: https://github.com/cwtan501/nvidia_tf_template
Hi @cwtan501 I tried to run your template from aws p2 instance but it failed to build docker image at
Step 22/35 : RUN apt-get update && apt-get install -y --no-install-recommends cuda-libraries-$CUDA_PKG_VERSION cuda-cublas-9-0=9.0.176.4-1 libnccl2=$NCCL_VERSION-1+cuda9.0 && apt-mark hold libnccl2 && rm -rf /var/lib/apt/lists/* ---> Using cache ---> b272447bbe1b Step 23/35 : RUN mkdir -p /usr/local/cuda/include ---> Using cache ---> 7ab576eadefb Step 24/35 : COPY /cuda/include/* /usr/local/cuda/include/ COPY failed: no source files were specified
Now nvidia provides cuda docker images, we can try:
FROM nvidia/cuda:9.2-cudnn7-runtime
# alias python3 -> python
RUN echo '#!/bin/bash\npython3 "$@"' > /usr/bin/python && \
chmod +x /usr/bin/python
# install python dependencies
RUN pip3 install cloudpickle==0.5.* pyzmq==17.0.* requests==2.18.* scikit-learn==0.19.* \
numpy==1.14.* pyyaml==3.12.* docker==3.1.* kubernetes==5.0.* tensorflow==1.6.* mxnet==1.1.* pyspark==2.3.* \
xgboost==0.7.*
# install binary dependencies
RUN mkdir -p /model \
&& apt-get update -qq \
&& apt-get install -y -qq libzmq5 libzmq5-dev redis-server libsodium18 build-essential
# make sure you run this inside clipper directory
COPY clipper_admin /clipper_admin/
RUN cd /clipper_admin \
&& pip install -q .
WORKDIR /container
COPY containers/python/__init__.py containers/python/rpc.py /container/
COPY monitoring/metrics_config.yaml /container/
ENV CLIPPER_MODEL_PATH=/model
HEALTHCHECK --interval=3s --timeout=3s --retries=1 CMD test -f /model_is_ready.check || exit 1
RUN pip install -q tensorflow==1.6.*
COPY containers/python/tf_container.py containers/python/container_entry.sh /container/
CMD ["/container/container_entry.sh", "tensorflow-container", "/container/tf_container.py"]
Make sure you run docker build inside clipper directory, just git clone
should do.
Hi @wcwang07 ! Clipper is adding native support for PyTorch and TF on CUDA 10! I've made a PR adding support for PyTorch + CUDA 10 on docker, and will be rolling out TF support soon. This can be run on an AWS p2 instance. Make sure to choose the Deep Learning AMI (Ubuntu) Version 21
@simon-mo @RehanSD I was using FROM tensorflow/tensorflow:latest-gpu-py3 this image seems to resolve issue with finding GPU:0 device
@simon-mo ran this new gpu container with following stats:
recv: 0.000223 s, parse: 0.000013 s, handle: 0.157390 s
check it out at
docker pull wcwang07/test-gpu-container
clipper_conn.register_application(name="hello-tf", input_type="int", default_output="this is default output", slo_micros=3000000)
https://gist.github.com/wcwang07/aef2d54c134f7c43e726bf9d027770c9
python_deployer.deploy_tensorflow_model(clipper_conn=clipper_conn, name="tf-mobilnet", version=1, input_type="int", func=predict, tf_sess_or_saved_model_path='***',base_image='test-gpu-container',pkgs_to_install=['pillow'])
clipper_conn.link_model_to_app(app_name="hello-tf", model_name="tf-mobilnet")
Docker is addressed in this PR #669