Add min_p
Description of the feature request:
Add the min_p sampler as an option in generation_config. See https://arxiv.org/abs/2407.01082
What problem are you trying to solve with this feature?
This sampler effectively solves the problem of incoherent outputs at high temperatures, allowing for better model creativity and stability with temperatures well above 1.0.
Any other information you'd like to share?
No response
Is this issue about Api limitation? if not can I try to solve it?
Well, there is no direct parameter specified in gemini api documentation for setting min_p. Now, we could have calculated min_p using dynamic truncation as shown in the paper above, but that would also require the top_k tokens(and their probabilities), which are not given in the response generated. So, any other ideas to implement it?
So, the Google GenAI Python library doesn’t yet support something called min_p sampling, which is a function that helps the model stay creative and avoid making dull or repetitive responses, especially when using a higher temperature (which controls randomness). The idea is to filter out the really unlikely next-word options before the model picks one, but since the library doesn’t have this built-in yet, you’d have to work around it. That means making multiple API calls just to get the probability of each next word, then doing some calculations (based on a recent research paper) to adjust those probabilities manually. It’s definitely a bit heavy on resources and not super efficient, but it is possible if you really want to experiment with it. Hopefully, Google adds it natively soon to make this easier.
#724 You can see implementation here. Although very vague