text-generation-inference
text-generation-inference copied to clipboard
converting docker images to singularity
Feature request
I am trying to run tgi in a HPC cluster. I tried pulling the docker images with singularity. The problem is that in that case the custom kernels do not work and the cuda complains that PTX is build with another version of the toolchain
. Is there any solution to this?
I also wanted to build the image from the dockerfile but the transition from the dockerfile to singularity image is not straightforward.
What are my options? what is the problem with those custom kernels in my case?
Motivation
Many HPCs does not let the user run docker.
Your contribution
I can test the suggested methods to solve the problem and later on post the steps that worked for me.
I am also in HPC environment. "--disable-custom-kernels" works for me. However, I don't know what the impact would be.
What kind of GPU is it ? H100 ? I'll look into this and see why it fails on some platforms. I'm guessing the kernels are built against an incompatible compute_arch
custom kernels shouldn't be necessary most of the time (there are only used for NEOX and BLOOM).
Could be a duplicate of #739
Feature request
I am trying to run tgi in a HPC cluster. I tried pulling the docker images with singularity. The problem is that in that case the custom kernels do not work and the cuda complains that
PTX is build with another version of the toolchain
. Is there any solution to this?I also wanted to build the image from the dockerfile but the transition from the dockerfile to singularity image is not straightforward.
What are my options? what is the problem with those custom kernels in my case?
Motivation
Many HPCs does not let the user run docker.
Your contribution
I can test the suggested methods to solve the problem and later on post the steps that worked for me.
I've had success pulling the official docker image for my platform and then building a singularity image from the docker archive.
docker pull --platform amd64 xxx
docker save xxx -o xxx.tar
singularity build xxx.sif docker-archive://xxx.tar
I can confirm that disabling the custom kernels via the DISABLE_CUSTOM_KERNELS
environment variable works for running an Apptainer container with an A100. If I do not have this flag set then I get the same CUDA error RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
. My full CLI command to start of my container with a local LLM model is:
apptainer run \ --nv \ --bind $volume:/data \ --env DISABLE_CUSTOM_KERNELS=true \ hf_text_generation_inference_v110.sif --model-id ./data/models--bigcode--starcoderbase-3b/snapshots/e1c5ef4ebb97afa0db09ec3e520f0487ca350bbe/ --port 8000
I imagine that it would work the same for a Singularity container.
Fwiw, my driver and CUDA settings are:
NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7
It's possible that your driver is too old. As you can see, it supports Cuda version up to 11.7 while TGI is using 11.8.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.