FlexGen icon indicating copy to clipboard operation
FlexGen copied to clipboard

Suggestion: Add support for different decoding strategies (Top P)

Open anujnayyar1 opened this issue 2 years ago • 2 comments

Firstly thank you for sharing this awesome and easy to use work!! It’s a great step forward in democratising LLMs.

It would be really helpful in practical applications if we could adjust different decoding strategies.

I believe some of the most useful would be:

  • Top P
  • Top K
  • Contrastive Search

All the best,

Anuj Nayyar

anujnayyar1 avatar Feb 21 '23 06:02 anujnayyar1

I second this. Temperature is good, but extending to different decoding strategies will help with replicating results found in other runtimes.

brandonvessel avatar Mar 26 '23 02:03 brandonvessel

Hi, thanks for the suggestion! The sampling methods can be added here: https://github.com/FMInference/FlexGen/blob/3502de5f251098f02998a5805fcf499aea809135/flexgen/pytorch_backend.py#L280-L284 Feel free to try it by yourself. Community contributions are welcome too!

Ying1123 avatar Mar 28 '23 06:03 Ying1123