[Feature Request] Stop inferencing button

Open FreedomCoder-dev opened this issue 1 year ago • 1 comments

Reference Issues

No response

Summary

If we already have a response or encounter inappropriate behavior, we can prevent the LLM from consuming additional tokens and time.

Basic Example

Irrelevant or Off-Topic Responses Use Case: If the model starts generating content that is irrelevant or strays from the intended topic, the user can stop the generation to avoid wasting time or resources. Example: You ask for a summary of a scientific paper, but the model starts discussing unrelated theories.
Excessive Length Use Case: When the model's response becomes excessively long and verbose, the user can stop it to get a more concise answer. Example: You request a brief explanation of a concept, but the model begins to write a lengthy essay.
Repetitive Content Use Case: When the model begins to repeat itself or generate redundant information, the user can stop the process to avoid redundancy. Example: You ask for a list of benefits of exercise, but the model keeps repeating the same points.

Drawbacks

No response

Unresolved questions

No response

Dec 29 '24 16:12 FreedomCoder-dev

Thanks for putting out this app. I'd like to second this. I ran into a case of a model getting stuck "thinking forever" (Qwen 3 quantized). In fact, it went on so long that Macai crashed.

May 18 '25 13:05 arjunguha