OlivierDehaene comments

Results 149 comments of


                                            OlivierDehaene

AsyncInferenceClient - Unclosed client session

@sapountzis we had an issue in the past with cuda graphs + quantization. Where you using that by any chance? It should be fixed in v2.0.0.

AsyncInferenceClient - Unclosed client session

@sapountzis, we are deprecating bnb in favour of eetq as it is way faster. @mhillebrand, the "Too many open files" is hapenning for the client so tweaking the container ulimit...

TGI v1.3.1 Docker container Mixtral 8x7b not running

Can you run --env ?

TGI v1.3.1 Docker container Mixtral 8x7b not running

That seems to be an issue with Triton. Maybe this comment can help? https://github.com/pytorch/pytorch/issues/107960#issuecomment-1783432552

TGI v1.3.1 Docker container Mixtral 8x7b not running

Weird. We run all our deployments in k8s and don't run into this issue. I will have a look.

TGI v1.3.1 Docker container Mixtral 8x7b not running

- Do you guys deploy in EKS, GKE, something else? - Do you use the Nvidia Operator, Nvidia device plugin? - Do you overwrite the NVIDIA_DRIVER_CAPABILITIES, or LD_LIBRARY_PATH env var...

TGI v1.3.1 Docker container Mixtral 8x7b not running

> We aren't using Kupernetes; we convert the public docker image to singularity to run on our HPC system. I think the original issue came from trouble running the image...

converting docker images to singularity

It's possible that your driver is too old. As you can see, it supports Cuda version up to 11.7 while TGI is using 11.8.

Memory leak from long-duration inference

@Ichigo3766, what type of hardware do you run TGI with? and are you on > v1?

Webserver crashing with GPTQ model `Server error: transport error Error: Warmup(Generation("transport error"))`

Nice thank you! I will update the version of torch in the image as soon as 2.1 is out. Cheers!