djl
djl copied to clipboard
Adding ignore_eos_token support in Chat Completions API Schema
Description
ignore_eos_token is commonly used additional parameter to help standardize LLM benchmarks by forcing the requests to generate a consistent output seq len.
-Will this change the current api? How?
It will be adding the ignore_eos_token as additional optional field in the request body.
-Who will benefit from this enhancement?
Anyone who is trying to do benchmark or gain a better understanding of the performance
References
- https://docs.djl.ai/master/docs/serving/serving/docs/lmi/user_guides/lmi_input_output_schema.html. The same feature is already supported in the "Additional LMI Dist Generation parameters" and "Additional vLLM Generation Parameters". "Additional TensorRT-LLM Generation Parameters" also has flag of 'min_length', achieving similar behavior.
@sindhuvahinis