zhengkai comments

Repositories
Issues
Comments

Results 11 comments of


                                            zhengkai

[Bug] `Meta-Llama-3.1-8B-Instruct` triggers "Detected errors during sampling! NaN in the probability." under high concurrency/RPS.

I'm having the same problem with ``Meta-Llama-3.1-70B-Instruct`` and **fp8** quantization under high concurrency/RPS (without using ``--kv-cache-dtype fp8_e5m2``). When I add ``--chunked-prefill-size 2048``, there is no error.