zhengkai

Results 11 comments of zhengkai

I'm having the same problem with ``Meta-Llama-3.1-70B-Instruct`` and **fp8** quantization under high concurrency/RPS (without using ``--kv-cache-dtype fp8_e5m2``). When I add ``--chunked-prefill-size 2048``, there is no error.