llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Feature Request: echo=true in llama-server

Open ciaran-regan-ie opened this issue 1 year ago • 1 comments

Prerequisites

  • [X] I am running the latest code. Mention the version if possible as well.
  • [X] I carefully followed the README.md.
  • [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

The llama-server allows api calls with logprobs=1, but it would be very nice to also include the option to set echo=True, as was available for older OpenAI models such as davinci-002.

Motivation

This would allow for a number of interesting possibilities such as inferring the likelihood of a prompt given a completion, as done in this project.

OpenAI depreciates the echo option because it's too useful :) would be great to have it back in llama.cpp.

Possible Implementation

No response

ciaran-regan-ie avatar Aug 09 '24 08:08 ciaran-regan-ie

This would be similar to support --all-logits from llama-perplexity right? This would be very useful in the server allowing us to use the server for benchmarking as well.

sragrawal avatar Aug 10 '24 12:08 sragrawal

I have a use case for this as well.

kaetemi avatar Sep 10 '24 18:09 kaetemi

Any updates for this? This seems like an important feature

ciaran-regan-ie avatar Oct 26 '24 00:10 ciaran-regan-ie

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Dec 09 '24 01:12 github-actions[bot]

echo=True is used together with logprobs=True in lm-evaluation-harness with local-completions model type for squadv2 and possibly for other benchmarks. So it's nice to have this implemented.

See also:

  • https://github.com/ggml-org/llama.cpp/issues/12591
  • https://github.com/EleutherAI/lm-evaluation-harness/pull/2856

Is it possible to reopen this issue ?

In server.cpp where is the right place to store the tokenised prompt and the logprobs of the prompt tokens?

dodekapod avatar Apr 29 '25 07:04 dodekapod

Also interested in this feature. This would add great benefit for prompt analysis!

mwebr avatar Jun 04 '25 22:06 mwebr