[Feature Request] Add support for logprobs to the mlx_lm server
I have begun a PR request to lm-evaluation-harness to support MLX models, but have become bogged down by details regarding many things that are already implemented in mlx_lm. As I have become more familiar with that framework, it seems all we really need to be able to produce evaluations of MLX models over OpenAI is to add support for logprobs to the current server infrastructure.
This would greatly benefit the feedback loop of using MLX to train and evaluating them in a semi-standard way against other models.
I would be happy to contribute a PR, but just need some pointers regarding the relationship between Log probabilities of output tokens (i.e,. the "likelihood of each token occurring in the sequence given the context. To simplify, a logprob is log(p), where p = probability of a token occurring at a specific position based on the previous tokens in the context.") and the token probabilities we already return from mlx_lm.utils.generate_step
The probabilities that gets returned is the probability of the given token at that time step. To get the log probabilities you would just take the log of it mx.log(p).
I would be happy to contribute a PR,
Thanks! It would be great to ease the path to evaluating models in MLX.
Sounds good. I will contribute a PR for the server. Thanks for the input.
Looks like this is already supported https://github.com/ml-explore/mlx-lm/blob/59c2844cc202dc5510f65d1effc1606264ba4cd6/mlx_lm/SERVER.md?plain=1#L78
Btw I added support for log-prob visualization with mlx-lm here https://github.com/eli5-org/eli5/pull/56 and had to do some major tweaks to support it, because the format of logprobs in mlx-lm is very different from those from OpenAI models -- main difference is that OpenAI returns tokens as strings, while mlx-lm returns token ids, so on the client you need to have the same tokenizer as on the server to make it useful.
main difference is that OpenAI returns tokens as strings, while mlx-lm returns token ids
It might be worth changing that on the mlx-lm side if that's the standard
That would be great -- and it would be possible to provide both to avoid breaking backwards compatibility, because right now OpenAI has the content field, like this:
ChoiceLogprobs(content=[
ChatCompletionTokenLogprob(
token="Hello",
logprob=math.log(0.9),
top_logprobs=[],
),
ChatCompletionTokenLogprob(
token=" world",
logprob=math.log(0.2),
top_logprobs=[],
),
ChatCompletionTokenLogprob(
token=" world",
logprob=math.log(0.4),
top_logprobs=[],
),
])
while from mlx-lm I get something like this, with token_logprobs and tokens fields:
ChoiceLogprobs(
token_logprobs=[
math.log(0.9),
math.log(0.2),
math.log(0.4),
],
tokens=tokenizer.encode('Hello world world', add_special_tokens=False),
)