server How to set request id in Generate？

This doesn't seem to work

/v2/models/ensemble/generate

{
    "text_input": "...",
    "parameters": {
        "id": "123",
    },
    "sequence_id": "456"
}

verbose log:

Mar 08 '24 07:03 Missmiaom

Hi @Missmiaom,

Can you please provide full reproduction steps (all commands run) and fill out the bug template below? Thank you! Description A clear and concise description of what the bug is.

Triton Information What version of Triton are you using?

Are you using the Triton container or did you build it yourself?

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

Mar 08 '24 21:03 yinggeh

Triton Information What version of Triton are you using? 23.10

Are you using the Triton container or did you build it yourself? container

To Reproduce Steps to reproduce the behavior.

When I use the ensemble model and call it through the /v2/models/ensemble/generate interface, the request id I passed cannot be printed in the built-in verbose log of trtion.

Expected behavior A clear and concise description of what you expected to happen.

Triton's built-in verbose log can print request id

@yinggeh

Mar 11 '24 08:03 Missmiaom

Hi @Missmiaom . Thanks for waiting. I have opened the ticket DLIS-6456 for our engineers to investigate.

Apr 10 '24 22:04 yinggeh

@yinggeh This feature already done?

Jul 08 '24 11:07 dafu-wu

@dafu-wu PR https://github.com/triton-inference-server/server/pull/7392 is currently under review. Thanks for your patience.

Jul 09 '24 18:07 yinggeh

PR https://github.com/triton-inference-server/server/pull/7392 merged. Reopen for further questions.

Jul 15 '24 21:07 yinggeh

@yinggeh Thanks for your updated, I use the latest trition server image 24.07 to build the tensorrtllm backend, but the log still has some problem: Screenshot2024_08_15_145514

request body: '{"id": "11111", "sequence_id": "456", "text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2, "top_p":1, "top_k":1, "temperature":0.7}'

Is there something wrong with the way I use it? @shreyas-samsung can you help us explain it?

Aug 15 '24 06:08 dafu-wu

@dafu-wu Looks like PR https://github.com/triton-inference-server/server/pull/7392 missed the deadline of 24.07 release. Could you try building the latest or wait for the 24.08 image?

Aug 15 '24 09:08 yinggeh

@yinggeh Do you know which dockerfile is the dockerfile of the official image?

Aug 15 '24 09:08 dafu-wu

@dafu-wu Thanks for your patience. Are you referring to Dockerfile.* in server repo?

Aug 18 '24 05:08 yinggeh

server server copied to clipboard

How to set request id in Generate？

server
server copied to clipboard