llama.cpp
llama.cpp copied to clipboard
Suppress output that isn't from the model
I want to integrate this into a slim chat system, so I think it would be nice to be able to have the app output only the text from the model like a -q for "quiet" flag on run.
for testing purposes, you can just comment out all printfs in main.cpp, run make again, and run ./main
Another option is just using fprintf(stderr, ...)
everywhere except in that one particular case that prints tokens from the model. According to POSIX standard, stderr is not limited for errors, but can be used to print diagnostic output too.
One can use ./main ... 2>dev/null
to suppress any diagnostic output
PR https://github.com/ggerganov/llama.cpp/pull/48 does exactly this.