Confusing `finish_reason` when using `max_tokens` property in 'v1/chat/completions' endpoint

Open daJuels opened this issue 1 year ago • 0 comments

LocalAI version:

v2.20.1 a9c521eb41dc2dd63769e5362f05d9ab5d8bec50

Environment, CPU architecture, OS, and Version: OS: 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux ENV: Docker version 26.0.1, build d260a54 HW: i9-10900F, RTX3080, 128GB RAM

Describe the bug When using the Endpoint 'v1/chat/completions' with the max_tokens parameter set to a specific value, the completion may be cut off, but the finish_reason remains stop instead of changing to length, making it difficult to determine if the answer is complete.

Additionally, when not using the max_tokens property, the response may still be cut off, but the finish_reason remains 'stop'.

To Reproduce

Send a request to the v1/chat/completions endpoint with the max_tokens property set to a specific value (e.g., 20).
Observe the response.

Expected behavior When the max_tokens property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, the finish_reason should be length instead of stop.

Sep 10 '24 13:09 daJuels