Use top_logprobs from OpenAI API
Is your feature request related to a problem? Please describe. Current remote implementation only takes 1 token at a time.
Describe the solution you'd like
For OpenAI we could take up to 20 tokens with top_logprobs API.
Agree this would be majorly helpful
As far as I understand OpenAI API, you can receive top_logprobs, you can even receive tokens as a stream, but you cannot influence the process of inference: you receive a token chosen by the model and you cannot change it.
So, if a guidance team confirms that, we can close this issue as currently irresolvable...
Someone courageous enough could even create an issue in OpenAI API repository for them to change the API. Yet it is still a question whether guidance-like inference with such modified API will be efficient enough. Has anyone checked how bad is performance of the guidance library with a local LLM hosted on a server in the web?
Seeing this late, but @Columpio is exactly right -- we have incredibly limited control on remote models that have not integrated guidance support into their servers. Even if we wanted to issue a new API call for each token we wanted to modify, the Chat API for OpenAI doesn't reliably allow us to pre-fill the assistant response, which means we can't do anything better than regenerating the entire response for OpenAI models.
@Harsha-Nori Thanks!
I wonder whether you have any communication with OpenAI or any other providers to add guidance support to their API. It could be profitable both for you and community. What do you think?