ChiikawaSama

Results 3 comments of ChiikawaSama

> > 添加了--disable-flashinfer-sampling ,解决了输出随机的问题,但是在跑一致率的时候,发现和vllm差距较大。效果也很差,。 > > 我使用最新版本,hash值是c996e8ccd415f6e1077ace5bc645d19a8dd40203,并添加--disable-flashinfer-sampling。 使用同一输入,Qwen2-7B模型的输出依然是随机的,请问能提供一份ServerArgs给我进行参考么? > > I use the latest version, with a hash value of c996e8cd415f6e1077ace5bc645d19a8dd40203, and adding --disable-flashinfer-sampling. Using the same input, the output...

@peri044 Hello, sorry for bothering you, but I thought you were the main contributor of this commit (hf compiled LLM), so maybe you have interest on this problem. I have...

> Hello [@ChiikawaSama](https://github.com/ChiikawaSama) , thanks for sharing this. Is this profile coming from `generate_from_static_cache` function (because we do initialization of position_ids / post processing of logits but no gpu activity...