Daniël de Kok comments

Repositories
Issues
Comments

Results 123 comments of


                                            Daniël de Kok

TGI does not support FP8 quantized models on ROCm

Any chance you could try `docker pull ghcr.io/huggingface/text-generation-inference:latest-rocm`? ROCm FP8 support was improved yesterday: https://github.com/huggingface/text-generation-inference/pull/2588

how to use the model's checkpoint in local fold?

Did you try to remove the double dashes in the model name `models--lllyasviel--omost-llama-3-8b-4bits` as suggested in the error?

CUDA: an illegal memory access was encountered with Mistral FP8 Marlin kernels on NVIDIA driver 535.216.01 (AWS Sagemaker Real-time Inference)

Any chance you could run with `CUDA_LAUNCH_BLOCKING=1`, which may help pinpointing the source of the error? It's also worth testing with `USE_CUTLASS_W8A8=1`, which will use CUTLASS gemm kernels instead (only...