Confusing `finish_reason` when using `max_tokens` property in 'v1/chat/completions' endpoint
LocalAI version:
v2.20.1 a9c521eb41dc2dd63769e5362f05d9ab5d8bec50
Environment, CPU architecture, OS, and Version: OS: 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux ENV: Docker version 26.0.1, build d260a54 HW: i9-10900F, RTX3080, 128GB RAM
Describe the bug
When using the Endpoint 'v1/chat/completions' with the max_tokens parameter set to a specific value, the completion may be cut off, but the finish_reason remains stop instead of changing to length, making it difficult to determine if the answer is complete.
Additionally, when not using the max_tokens property, the response may still be cut off, but the finish_reason remains 'stop'.
To Reproduce
- Send a request to the
v1/chat/completionsendpoint with themax_tokensproperty set to a specific value (e.g., 20). - Observe the response.
Expected behavior
When the max_tokens property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, the finish_reason should be length instead of stop.