drbh

Results 69 comments of drbh

notes - update to handle multiple requests instead of just the first - stream responses back with an index - docs https://platform.openai.com/docs/api-reference/completions/create

example requests: streaming with openai ```python from openai import OpenAI YOUR_TOKEN = "YOUR_API_KEY" # Initialize the client, pointing it to one of the available models client = OpenAI( base_url="http://localhost:3000/v1", api_key=YOUR_TOKEN,...

**note the client library intentionally does not include a `completions` method because this is a legacy API. The changes in this PR are to align with the API and address...

**failing client tests do not seem related to these changes and are resolved here: https://github.com/huggingface/text-generation-inference/pull/1751

>... The logs are rather poor compared to the regular endpoints. > > ``` > 2024-04-16T10:42:49.931556Z INFO text_generation_router::server: router/src/server.rs:500: Success > ``` > > vs > > ``` > 2024-04-16T10:42:56.302342Z...

logs are now bubbled up to the calling function and output the same information as `generate` and `generate_stream` change: `generate_internal` and `generate_stream_internal` now take a `span` as an argument and...

Hi @daz-williams thank you for using TGI and opening this issue, however this is the intended functionality since `details` are not a concept in the chat api. The ` /v1/chat/completions`...

closing this issue as this is the expected functionality (described above)

closing since this is a bit outdated and a better impl strategy has been suggested above. Thanks!

resolved with https://github.com/huggingface/text-generation-inference/pull/1693