TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

chore: Refactor return of first gen token in PD

Open Shunkangz opened this issue 9 months ago • 19 comments

In the current implementation, we will return the first and second generated token together from generation worker. I refactor this logic and return the first generated token from context worker as soon as it finishes computation.

Shunkangz avatar Mar 23 '25 14:03 Shunkangz

/bot run

Shunkangz avatar Mar 23 '25 14:03 Shunkangz

@chuangz0 @xiaoweiw-nv @pcastonguay can you help review this PD related MR?

Thanks June

juney-nvidia avatar Mar 23 '25 22:03 juney-nvidia

/bot run

Shunkangz avatar Mar 24 '25 08:03 Shunkangz

/bot run

Shixiaowei02 avatar Mar 24 '25 08:03 Shixiaowei02

/bot run

Shixiaowei02 avatar Mar 24 '25 08:03 Shixiaowei02

PR_Github #270 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 08:03 niukuo

PR_Github #270 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #260 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 12:03 niukuo

Can we add a description? What is this PR fixing? Can we add unit or integration tests to verify what this PR is fixing? See existing integration tests in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/integration/defs/disaggregated/test_disaggregated.py.

pcastonguay avatar Mar 24 '25 13:03 pcastonguay

Can we add a description? What is this PR fixing? Can we add unit or integration tests to verify what this PR is fixing? See existing integration tests in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/integration/defs/disaggregated/test_disaggregated.py.

I think that this MR does not change the expected output of PD. We can reuse the existing integration tests. Does it make sense to you? @pcastonguay

Shunkangz avatar Mar 25 '25 13:03 Shunkangz

/bot run

Shunkangz avatar Mar 25 '25 14:03 Shunkangz

PR_Github #444 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 14:03 niukuo

PR_Github #444 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #379 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 14:03 niukuo

Can we add a description? What is this PR fixing? Can we add unit or integration tests to verify what this PR is fixing? See existing integration tests in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/integration/defs/disaggregated/test_disaggregated.py.

I think that this MR does not change the expected output of PD. We can reuse the existing integration tests. Does it make sense to you? @pcastonguay

Yes thanks.

pcastonguay avatar Mar 25 '25 18:03 pcastonguay

/bot run

Shunkangz avatar Mar 26 '25 06:03 Shunkangz

PR_Github #529 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 06:03 niukuo

PR_Github #529 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #452 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 07:03 niukuo

/bot run

Shunkangz avatar Mar 28 '25 03:03 Shunkangz

PR_Github #666 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 28 '25 03:03 tensorrt-cicd

PR_Github #666 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #560 completed with status: 'SUCCESS'

tensorrt-cicd avatar Mar 28 '25 06:03 tensorrt-cicd

/bot reuse-pipeline

Shunkangz avatar Apr 01 '25 04:04 Shunkangz

PR_Github #840 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 04:04 tensorrt-cicd

PR_Github #840 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #666 for commit 613ec08

tensorrt-cicd avatar Apr 01 '25 04:04 tensorrt-cicd