llama.cpp Output of main and server is totally different

When I start the service by main and server command and ask the same question like '上海有什么好玩的', the output is different. so how to keep the output response of API is the same as the output of main command.

Apr 18 '24 01:04 njalan

Does it also change when you repeat it? LLMs are not deterministic, so there is no expectation that the same question yields the same answer. If that is the issue, try specifying a seed value with --seed.

Apr 18 '24 06:04 schmorp

how to keep the output response of API is the same as the output of main command.

It's unclear what settings you used. Readme shows seed, and temperature API.

Related: https://github.com/ggerganov/llama.cpp/issues/6569#issuecomment-2049673171

Apr 18 '24 13:04 Jeximo

I have the same issue. I have the temp 0 and fixed seed, but the results of main and server are consistently different every time. If the cache_prompt enabled, the second run from the server with cache will have the same result to the main, even the first run is different. This is a problem bc I found that results from main usually have a better quality.

Apr 18 '24 13:04 longcw

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 02 '24 01:06 github-actions[bot]