Cody Yu
Cody Yu
@simon-mo what do we do if isort and yapf are conflicting?
@peng1999 please let me know when this is available for the final review and I'll try to get this in asap. Thanks
@peng1999 can you look into the CI failure?
That's understandable. We could disable FlashInfer sampling in this case. Meanwhile we may want to note somewhere to encourage users to disable it when they found discrepancy (and unacceptable) outputs....
Hmm I suppose this would be the case everywhere then...I'll then suggest the following: 1. We disable FlashInfer sampling by default and use the env variable to enable it in...
> @comaniac it looked like he shared top-logprobs already from the gptq test? If it isn't using logprobs, I agree we should change that Yeah ideally we should leverage logprobs...
> The error comes from sampling. We can not guarantee the output will be all matched even if the target model is the same as the draft model because sampling...
> Can we add instructions in the [GitHub issues template](https://github.com/vllm-project/vllm/blob/main/.github/ISSUE_TEMPLATE/400-bug%20report.yml) so users can share their logs upon encountering such errors? Good point. Will do
@DarkLight1337 added to issue template. PTAL.
It should be fine as we never load it automatically? But yeah you may get virus if someone post a malicious pickle file to an issue...