ChiikawaSama comments

Results 3 comments of


                                            ChiikawaSama

[Bug] Unable to fix model output

> > 添加了--disable-flashinfer-sampling ，解决了输出随机的问题，但是在跑一致率的时候，发现和vllm差距较大。效果也很差，。 > > 我使用最新版本，hash值是c996e8ccd415f6e1077ace5bc645d19a8dd40203，并添加--disable-flashinfer-sampling。使用同一输入，Qwen2-7B模型的输出依然是随机的，请问能提供一份ServerArgs给我进行参考么？ > > I use the latest version, with a hash value of c996e8cd415f6e1077ace5bc645d19a8dd40203, and adding --disable-flashinfer-sampling. Using the same input, the output...

Performance Issue when using tools/llm

@peri044 Hello, sorry for bothering you, but I thought you were the main contributor of this commit (hf compiled LLM), so maybe you have interest on this problem. I have...

Performance Issue when using tools/llm

> Hello [@ChiikawaSama](https://github.com/ChiikawaSama) , thanks for sharing this. Is this profile coming from `generate_from_static_cache` function (because we do initialization of position_ids / post processing of logits but no gpu activity...