Feature Request: echo=true in llama-server
Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the README.md.
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
The llama-server allows api calls with logprobs=1, but it would be very nice to also include the option to set echo=True, as was available for older OpenAI models such as davinci-002.
Motivation
This would allow for a number of interesting possibilities such as inferring the likelihood of a prompt given a completion, as done in this project.
OpenAI depreciates the echo option because it's too useful :) would be great to have it back in llama.cpp.
Possible Implementation
No response
This would be similar to support --all-logits from llama-perplexity right? This would be very useful in the server allowing us to use the server for benchmarking as well.
I have a use case for this as well.
Any updates for this? This seems like an important feature
This issue was closed because it has been inactive for 14 days since being marked as stale.
echo=True is used together with logprobs=True in lm-evaluation-harness with local-completions model type for squadv2 and possibly for other benchmarks. So it's nice to have this implemented.
See also:
- https://github.com/ggml-org/llama.cpp/issues/12591
- https://github.com/EleutherAI/lm-evaluation-harness/pull/2856
Is it possible to reopen this issue ?
In server.cpp where is the right place to store the tokenised prompt and the logprobs of the prompt tokens?
Also interested in this feature. This would add great benefit for prompt analysis!