Daniël de Kok

Results 143 comments of Daniël de Kok

Any chance you could test TGI 3.1.1? We fixed two prefix caching edge cases that can lead to long-term corruption.

Could you check the size of the key-value cache in both cases? The memory freed up by quantization is used to increase the size of the key-value cache, so that...

That's odd, the KV-cache size is logged unconditionally at the info level during warmup. It's only added in TGI 2.4.0, so the message wouldn't be logged in 2.0.5 (though the...