Daniël de Kok

Results 123 comments of Daniël de Kok

Any chance you could try `docker pull ghcr.io/huggingface/text-generation-inference:latest-rocm`? ROCm FP8 support was improved yesterday: https://github.com/huggingface/text-generation-inference/pull/2588

Did you try to remove the double dashes in the model name `models--lllyasviel--omost-llama-3-8b-4bits` as suggested in the error?

Any chance you could run with `CUDA_LAUNCH_BLOCKING=1`, which may help pinpointing the source of the error? It's also worth testing with `USE_CUTLASS_W8A8=1`, which will use CUTLASS gemm kernels instead (only...