macai
macai copied to clipboard
[Feature Request] Stop inferencing button
Reference Issues
No response
Summary
If we already have a response or encounter inappropriate behavior, we can prevent the LLM from consuming additional tokens and time.
Basic Example
- Irrelevant or Off-Topic Responses Use Case: If the model starts generating content that is irrelevant or strays from the intended topic, the user can stop the generation to avoid wasting time or resources. Example: You ask for a summary of a scientific paper, but the model starts discussing unrelated theories.
- Excessive Length Use Case: When the model's response becomes excessively long and verbose, the user can stop it to get a more concise answer. Example: You request a brief explanation of a concept, but the model begins to write a lengthy essay.
- Repetitive Content Use Case: When the model begins to repeat itself or generate redundant information, the user can stop the process to avoid redundancy. Example: You ask for a list of benefits of exercise, but the model keeps repeating the same points.
Drawbacks
No response
Unresolved questions
No response
Thanks for putting out this app. I'd like to second this. I ran into a case of a model getting stuck "thinking forever" (Qwen 3 quantized). In fact, it went on so long that Macai crashed.