llama.cpp
llama.cpp copied to clipboard
"Press Ctrl+C to interject at any time" is a bad chice - use something else, maybe [Esc]
I think Ctrl+C
is a bad choice to interrupt the output, because Ctrl+C
also stops the program.
It has happened to me that I pressed Ctrl+C
at the wrong moment and a conversation was lost as a result.
It has happened to me, I think it will have happened to others, it will happen again - and that is because Ctrl+C
has this dual function.
A better key for interrupting would be the [Esc] key in my opinion.
no but it's the shell'
it's the shell ending the program on sigint. it's the developers decision to use sigint to give control back to the user. I think it should be done with aother key, which would mean not to use sigint.
if it stays with sigint, meaning ctrl+c, i think sigint should also be captured before ending the program with something like "exit ? (y/n or ctrl+c again)" at least that should be configurable with a switch.
I think most of main.cpp has become a huge spagetti of terminal handling code. Should really be changed to some other kind of UI like ncurses.
@SlyEcho its clearly never been about providing UIs to regular users though? Their focus has been about speed and optimizations which is absolutely best! Use kobold cpp for ui
Actually, in some ways i prefer the simple console ui over the webUIs out there. While koboldcpp is somehow fast in output, it always feeds back in a lot of the oconversation, which makes it slow in prompt ingestion. So, a really performant console frontend does have it's worth and should receive some love in development.
This has happened to me. If you hit Ctrl-C
at just the right time you can get stuck waiting while the context is reset and reevaluated and then you'll start to question yourself. Maybe you didn't hold Ctrl
down? Maybe you didn't press the C
key hard enough? So you finally hit it again and then it exits the program.
I've submitted a couple pull requests for main that focus on the user experience, but they seem to have gone mostly ignored. I'm not sure if they're just focusing so hard on performance right now that they're taking an "if it ain't broke don't fix it" approach to main or if they really just want to keep it as simple at all possible costs.
I have one pull request that's been up for a couple weeks now that improves the user experience through more specific console input handling. And it does clean up main.cpp a bit, but at the same time it adds to the console handling code in common.cpp which may still be undesirable. It processes one key press at a time, which is the approach you'd need to interrupt on Esc
or whatever key people wanted to use for interruption. So it's step towards that but I've gotten very little feedback and I'm hesitant to put more time into improving the user experience if it's not the desired direction for the project.
In the README.md that ggerganov wrote, he states that his goal was a "Plain C/C++ implementation without dependencies". So that sounds like it rules out ncurses. Either way, I'm not even mad if the goal is just to keep main as simple as possible, but I'm hoping the real reason it's going mostly ignored is that everyone with permission to review code and accept a pull request is just busy focusing on the performance stuff.
Still, it would be nice to know if the issue is just that they want to minimize the code that goes into the main example at all costs or if it's just not the priority at the moment. I could probably even submit a pull request for a separate cli ui that has a more polished end user experience if they're just worried about breaking the program they use for testing everything, but maybe that's not even desired?
With StableML and all these other open source LLMs being released, it might be best to just make a new project that tries to make an interface that can use any ggml supported text model. It's the 4-bit (and now 5-bit) quantization and focus on the end user devices that makes this project so exciting but all of the work being done here seems to be pushed back to the official ggml repo (with the exception of llama.cpp itself) and should, I believe (is this right?), also benefit all model running on ggml. So perhaps such a project would be a better point to focus on for end user experience issues.
Thanks for the insights. Yes, it would be really nice to have a conso,le based interface for could support any ggml -supported llms. I would be glad to see something like this happen.
Yes
I agree, using [Esc] key is a better option
Is there a way anymore to interrupt generation without killing the program? You used to be able to hit ctrl-c and make a new request without restarting the program and losing context, but sometime in the last month or two that was taken away, and ctrl-c only kills the program now.
I understand not wanting ctrl-c to be the chosen key, but having the ability at all is pretty important.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Not stale. Bad bot.