Output of main and server is totally different
When I start the service by main and server command and ask the same question like '上海有什么好玩的', the output is different. so how to keep the output response of API is the same as the output of main command.
Does it also change when you repeat it? LLMs are not deterministic, so there is no expectation that the same question yields the same answer. If that is the issue, try specifying a seed value with --seed.
how to keep the output response of API is the same as the output of main command.
It's unclear what settings you used. Readme shows seed, and temperature API.
Related: https://github.com/ggerganov/llama.cpp/issues/6569#issuecomment-2049673171
I have the same issue. I have the temp 0 and fixed seed, but the results of main and server are consistently different every time. If the cache_prompt enabled, the second run from the server with cache will have the same result to the main, even the first run is different. This is a problem bc I found that results from main usually have a better quality.
This issue was closed because it has been inactive for 14 days since being marked as stale.