FlexGen
FlexGen copied to clipboard
Suggestion: Add support for different decoding strategies (Top P)
Firstly thank you for sharing this awesome and easy to use work!! It’s a great step forward in democratising LLMs.
It would be really helpful in practical applications if we could adjust different decoding strategies.
I believe some of the most useful would be:
- Top P
- Top K
- Contrastive Search
All the best,
Anuj Nayyar
I second this. Temperature is good, but extending to different decoding strategies will help with replicating results found in other runtimes.
Hi, thanks for the suggestion! The sampling methods can be added here: https://github.com/FMInference/FlexGen/blob/3502de5f251098f02998a5805fcf499aea809135/flexgen/pytorch_backend.py#L280-L284 Feel free to try it by yourself. Community contributions are welcome too!