Yuchao Zhang comments

Results 116 comments of


                                            Yuchao Zhang

Add usage in response like openai?

I think u could customize the logic in [postprocess](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/postprocessing/1/model.py) and [preprocess](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/preprocessing/1/model.py) to do the calculation.

🚀 Feature: Make instrumentations compatible with "opentelemetry-instrument" automatic instrumentation

@nirga thanks for this update. Any eta to publish a new version so we can use it for auto instrumentation with opentelemetry-instrument?

🚀 Feature: Make instrumentations compatible with "opentelemetry-instrument" automatic instrumentation

how does it works?

all option is same as openai?

Sorry, I think I misunderstand the `n` in trtllm, where I have expected multiple beam would be returned. According to this thread, https://github.com/triton-inference-server/tensorrtllm_backend/issues/499, maybe I need to make multiple requests...

all option is same as openai?

By the way, do you know what choice.index would be like when using `stream` along with `n>1`?

all option is same as openai?

I did not find this option in https://platform.openai.com/docs/api-reference/chat/create

Feature request - Add all v1/ routes

@visitsb It's fine to add `/v1/models`. But the list of [full openai api](https://platform.openai.com/docs/api-reference/introduction) is long, like `/v1/audio`, `/v1/embedding`. What's the minimal subset is needed?

Feature request - Add all v1/ routes

The exposed API depends on the actual model hosted in triton backend. Since there's no embedding model available in trtllm, `/v1/embeddings` is not possible. For embedding model, maybe you can...

Add additional check for empty tool_calls

@0xMochan Happy new year. Kindly ask are the team back to office? Here's the latest openai openapi spec on `tool_calls` https://github.com/openai/openai-openapi/blob/d2eaa350b5b619ad6355384279a9beb9d423d88b/openapi.yaml#L12733-L12737 and they accept empty list like https://github.com/openai/openai-openapi/blob/d2eaa350b5b619ad6355384279a9beb9d423d88b/openapi.yaml#L7781

openai_trtllm return 200 directly to the client when TTFT is greater than 15 seconds

I'm not sure if the input tokens have exceeded the max tokens. You can also check the postprocessing part in triton to debug the generated tokens if possible.