text-generation-inference converting docker images to singularity

Feature request

I am trying to run tgi in a HPC cluster. I tried pulling the docker images with singularity. The problem is that in that case the custom kernels do not work and the cuda complains that PTX is build with another version of the toolchain. Is there any solution to this? I also wanted to build the image from the dockerfile but the transition from the dockerfile to singularity image is not straightforward.

What are my options? what is the problem with those custom kernels in my case?

Motivation

Many HPCs does not let the user run docker.

Your contribution

I can test the suggested methods to solve the problem and later on post the steps that worked for me.

Jul 26 '23 20:07 MiladInk

I am also in HPC environment. "--disable-custom-kernels" works for me. However, I don't know what the impact would be.

Jul 29 '23 04:07 YeechingTiger

What kind of GPU is it ? H100 ? I'll look into this and see why it fails on some platforms. I'm guessing the kernels are built against an incompatible compute_arch

custom kernels shouldn't be necessary most of the time (there are only used for NEOX and BLOOM).

Jul 31 '23 16:07 Narsil

Could be a duplicate of #739

Jul 31 '23 17:07 Narsil

Feature request

I am trying to run tgi in a HPC cluster. I tried pulling the docker images with singularity. The problem is that in that case the custom kernels do not work and the cuda complains that PTX is build with another version of the toolchain. Is there any solution to this?

I also wanted to build the image from the dockerfile but the transition from the dockerfile to singularity image is not straightforward.

What are my options? what is the problem with those custom kernels in my case?

Motivation

Many HPCs does not let the user run docker.

Your contribution

I can test the suggested methods to solve the problem and later on post the steps that worked for me.

I've had success pulling the official docker image for my platform and then building a singularity image from the docker archive.

docker pull --platform amd64 xxx
docker save xxx -o xxx.tar
singularity build xxx.sif docker-archive://xxx.tar

Oct 11 '23 21:10 Blair-Johnson

I can confirm that disabling the custom kernels via the DISABLE_CUSTOM_KERNELS environment variable works for running an Apptainer container with an A100. If I do not have this flag set then I get the same CUDA error RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.. My full CLI command to start of my container with a local LLM model is:

apptainer run \ --nv \ --bind $volume:/data \ --env DISABLE_CUSTOM_KERNELS=true \ hf_text_generation_inference_v110.sif --model-id ./data/models--bigcode--starcoderbase-3b/snapshots/e1c5ef4ebb97afa0db09ec3e520f0487ca350bbe/ --port 8000 I imagine that it would work the same for a Singularity container.

Fwiw, my driver and CUDA settings are: NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7

Oct 27 '23 19:10 rastna12

It's possible that your driver is too old. As you can see, it supports Cuda version up to 11.7 while TGI is using 11.8.

Oct 30 '23 14:10 OlivierDehaene

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 22 '24 01:04 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

converting docker images to singularity

Feature request

Motivation

Your contribution

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard