text-generation-inference fix: repack for marlin when single scale is provided

This PR adjust the conditional for repacking fp8 for marlin to run when a single scale is provided. This avoids a IndexError in the case that scales only contain a single value.

not ~~related to: https://github.com/huggingface/text-generation-inference/issues/2388~~

Aug 13 '24 20:08 drbh

This change doesn' seem to fix neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 for me

Aug 14 '24 09:08 Narsil

I'm confused this still doesn't fix neural magic.

text-generation-launcher --model-id meta-llama/Meta-Llama-3-8B --quantize fp8

is currently working on main. This might have been fixed by something else ?

Can we introduce a failing test before fixing this ?

Aug 29 '24 14:08 Narsil

Closing as stale, feel free to reopen.

Oct 01 '24 14:10 Narsil