FastChat
FastChat copied to clipboard
Support logprob in OpenAI API
Thank you so much for your fantastic work. I meet a small problem and really hope you can help me. After building up OpenAI API, I try to send 'logprobs=1' in completion to obtain the token confidence, as shown below,
completion = openai.Completion.create(model=model, prompt=prompt, logprobs=1, max_tokens=64)
but the returned results still give 'logprobs': null.
Could you please have a check? Thank you so much!
It has not been supported. Community contributions are welcome!
Found this issue due to the same requirement. I'll try to submit a PR for this soon.
Hey @comaniac, have you implemented this yet? I also wish to have this functionality supported. If not, I can also try to take a look.
Sorry I end up implementing another API server for our internal usage so don't work directly on FastChat. I'll see if I could find some time to work on it, but please submit a PR if you already have and I'll help review.
I just took a detail look and found that it's not straightforward to add this feature to FastChat. The main challenge is that the current FastChat workers only return decoded texts without a list of tokens. However, the protocol of OpenAI logprobs (named it LogProbs) includes "tokens", "token_logprobs", "top_logprobs", and "text_offset". In other words, we cannot meet this protocol on the OpenAI API server. There are 2 options:
- Let each worker return
LogProbs. The benefits of this approach are 1) we almost don't need to change OpenAI API server but just need to pass the logprobs parameter to the worker; 2) we could implement this feature worker by worker without breaking existing use cases. However, this approach is not compatible with other API servers, and workers are no longer orthogonal to the API servers. - Let each worker return all necessary data, such as text, tokens, logprobs, etc. The benefits of this approach is we only need to process the result at the OpenAI API server, and workers are still orthogonal to the API servers. However, this requires to change all workers at once, because the API server now expects different outputs.
I could send a small PR with protocols and parameter passing first while discussing this issue.
I finally made the following decisions to deliver a working solution first:
Now the OpenAI API server expects the workers to return logprobs as a dictionary, including the following keys: text_offset, tokens, token_logprobs, and top_logprobs; otherwise logprobs should remain None. In this way, the change is compatible with other workers; while the default worker is able to support logprobs.
How complex is to generalize the code in the PR to logprobs > 1?
Is it just having token_logprobs as a list of lists in inference.py?
Not exactly. token_logprobs remains the same when logprobs>1. The difference is top_logprobs, which should look like
"top_logprobs": [
null,
{
")": -3.7536874 # This dict includes the logprobs of top-N tokens.
},
...
This is not hard to implement, but I just to want to keep this PR as straightforward as possible, and anyone could follow up with missing features.