djl Adding ignore_eos_token support in Chat Completions API Schema

Adding ignore_eos_token support in Chat Completions API Schema

Open jiahong-liu opened this issue 1 year ago • 1 comments

Description

ignore_eos_token is commonly used additional parameter to help standardize LLM benchmarks by forcing the requests to generate a consistent output seq len.

-Will this change the current api? How?

It will be adding the ignore_eos_token as additional optional field in the request body.

-Who will benefit from this enhancement?

Anyone who is trying to do benchmark or gain a better understanding of the performance

References

https://docs.djl.ai/master/docs/serving/serving/docs/lmi/user_guides/lmi_input_output_schema.html. The same feature is already supported in the "Additional LMI Dist Generation parameters" and "Additional vLLM Generation Parameters". "Additional TensorRT-LLM Generation Parameters" also has flag of 'min_length', achieving similar behavior.

Aug 06 '24 00:08 jiahong-liu

@sindhuvahinis

Aug 06 '24 00:08 lanking520

djl djl copied to clipboard

Adding ignore_eos_token support in Chat Completions API Schema

Description

References

djl
djl copied to clipboard