Claudio Montanari
Results
1
issues of
Claudio Montanari
### System Info I have reasons to believe that this https://github.com/huggingface/text-generation-inference/pull/1729 is causing a 2-3x performance regression on decoding stage when running EETQ quantized models on multiple shards with Cuda...