text-generation-inference
text-generation-inference copied to clipboard
fix: repack for marlin when single scale is provided
This PR adjust the conditional for repacking fp8 for marlin to run when a single scale is provided. This avoids a IndexError in the case that scales only contain a single value.
not ~~related to: https://github.com/huggingface/text-generation-inference/issues/2388~~
This change doesn' seem to fix neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 for me
I'm confused this still doesn't fix neural magic.
text-generation-launcher --model-id meta-llama/Meta-Llama-3-8B --quantize fp8
is currently working on main. This might have been fixed by something else ?
Can we introduce a failing test before fixing this ?
Closing as stale, feel free to reopen.