lm-evaluation-harness
lm-evaluation-harness copied to clipboard
update gguf backend to use Chat-completion API
The response structure for the logprobs of the /completion API was changed here: https://github.com/ggml-org/llama.cpp/commit/57bb2c40cd94c5a09f5210ed8264cc93b21c4b7e. Furthermore, /completion API is now Legacy ( https://platform.openai.com/docs/guides/completions). This commit adapts the gguf backend to utilize the /chat/completion API and now handles the logprobs response correctly. Moreover, this resolves an issue https://github.com/ggml-org/llama.cpp/issues/12591 where the llama-server did not recognize the echo parameter, as it is no longer necessary.
Hi! We should still keep the completions api as long as GGUF is supporting it. Otherwise will have to chat format the prompt for base models as well.
There is an issue with the current implementation as I pointed out in the issue mentioned in my last message. Starting a llama-server with the newest llama.cpp does not support the echo parameter anymore which is accessed in the lm-eval gguf file that I modified. Furthermore, the response structure of the logprobs that is expected in lm_eval/models/gguf.py was also changed in an update of Llama.cpp ( see last comment). So the current gguf implementation of LM-Evaluation-Harness throws errors when I use it. My edits should fix that at least for the gguf file, we could also use the completions API, but need to adapt the expected response structure.
I think we could still use the completions API, but still have to adapt to the response coming from the server, as the response structure has changed.