sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Feature] support min_p sampling

Open 81549361 opened this issue 6 months ago • 2 comments

Motivation

Motivation The min_p sampling parameter is becoming quite popular. It's conceptually simple and "makes sense", and (at least anecdotally, according to opinions of many model fine-tuners and users in the LocalLlama community) it tends to perform better than the usual top_p+top_k approach. You can see the readmes of HF repositories of many new model finetunes/merges recommend to use min_p instead of top_p and top_k.

Some of the code has been implemented in flashinfer. https://github.com/flashinfer-ai/flashinfer/pull/422

Related resources vLLM: https://github.com/vllm-project/vllm/blob/8ea5e44a435e8731fd6f5ba4c329dd112752532a/vllm/sampling_params.py#L64C9-L66C57 min_p: Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.

So e.g. a min_p of 0.07 means that if a token that is less than 7% of the probability of the highest-probability token, it will be disqualified. A min_p of 0.5 would mean that if a token is not at least half the probability of the highest-probability token, then it is disqualified. Said another way, min_p allows you to set a minimum fraction of the most likely token's probability, else the token cannot be sampled.

https://github.com/vllm-project/vllm/pull/1642 https://github.com/oobabooga/text-generation-webui/pull/4449 https://github.com/ggerganov/llama.cpp/pull/3841 Please see the above links for more info.

image

Related resources

No response

81549361 avatar Aug 13 '24 09:08 81549361