drbh
drbh
notes - update to handle multiple requests instead of just the first - stream responses back with an index - docs https://platform.openai.com/docs/api-reference/completions/create
example requests: streaming with openai ```python from openai import OpenAI YOUR_TOKEN = "YOUR_API_KEY" # Initialize the client, pointing it to one of the available models client = OpenAI( base_url="http://localhost:3000/v1", api_key=YOUR_TOKEN,...
**note the client library intentionally does not include a `completions` method because this is a legacy API. The changes in this PR are to align with the API and address...
**failing client tests do not seem related to these changes and are resolved here: https://github.com/huggingface/text-generation-inference/pull/1751
>... The logs are rather poor compared to the regular endpoints. > > ``` > 2024-04-16T10:42:49.931556Z INFO text_generation_router::server: router/src/server.rs:500: Success > ``` > > vs > > ``` > 2024-04-16T10:42:56.302342Z...
logs are now bubbled up to the calling function and output the same information as `generate` and `generate_stream` change: `generate_internal` and `generate_stream_internal` now take a `span` as an argument and...
Hi @daz-williams thank you for using TGI and opening this issue, however this is the intended functionality since `details` are not a concept in the chat api. The ` /v1/chat/completions`...
closing this issue as this is the expected functionality (described above)
closing since this is a bit outdated and a better impl strategy has been suggested above. Thanks!
resolved with https://github.com/huggingface/text-generation-inference/pull/1693