llm-inference
llm-inference copied to clipboard
Expose model generate parameters by API server
generate_kwargs:
do_sample: true
max_new_tokens: 128
min_new_tokens: 16
temperature: 0.7
repetition_penalty: 1.1
top_p: 0.8
top_k: 50