FlexGen Suggestion: Add support for different decoding strategies (Top P)

Suggestion: Add support for different decoding strategies (Top P)

Open anujnayyar1 opened this issue 2 years ago • 2 comments

Firstly thank you for sharing this awesome and easy to use work!! It’s a great step forward in democratising LLMs.

It would be really helpful in practical applications if we could adjust different decoding strategies.

I believe some of the most useful would be:

Top P
Top K
Contrastive Search

All the best,

Anuj Nayyar

Feb 21 '23 06:02 anujnayyar1

I second this. Temperature is good, but extending to different decoding strategies will help with replicating results found in other runtimes.

Mar 26 '23 02:03 brandonvessel

Hi, thanks for the suggestion! The sampling methods can be added here: https://github.com/FMInference/FlexGen/blob/3502de5f251098f02998a5805fcf499aea809135/flexgen/pytorch_backend.py#L280-L284 Feel free to try it by yourself. Community contributions are welcome too!

Mar 28 '23 06:03 Ying1123

FlexGen FlexGen copied to clipboard

Suggestion: Add support for different decoding strategies (Top P)

FlexGen
FlexGen copied to clipboard