OpenLLM
OpenLLM copied to clipboard
feat: Document usage of request_id
Feature request
Hi,
The /v1/generate endpoint returns a request_id as part of the JSON response. I assume that when finished
is set to false
, I can somehow use this request ID to query for the rest of the output later. However, the OpenAPI documentation I can access under http://127.0.0.1:3000/ does not seem to document anywhere which endpoint to use for that. Or am I mistaken and this is not possible?
I am using the latest ghcr.io/bentoml/openllm Docker image like this:
docker run --rm -it -p 3000:3000 --platform linux/x86_64 ghcr.io/bentoml/openllm start facebook/opt-1.3b --backend pt
Kind regards, Alexander
Motivation
This feature would allow me to find out how the request_id can be used to follow up on incomplete queries, if this is at all possible.
Other
No response