chore: Refactor return of first gen token in PD
In the current implementation, we will return the first and second generated token together from generation worker. I refactor this logic and return the first generated token from context worker as soon as it finishes computation.
/bot run
@chuangz0 @xiaoweiw-nv @pcastonguay can you help review this PD related MR?
Thanks June
/bot run
/bot run
/bot run
PR_Github #270 [ run ] triggered by Bot
PR_Github #270 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #260 completed with status: 'FAILURE'
Can we add a description? What is this PR fixing? Can we add unit or integration tests to verify what this PR is fixing? See existing integration tests in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/integration/defs/disaggregated/test_disaggregated.py.
Can we add a description? What is this PR fixing? Can we add unit or integration tests to verify what this PR is fixing? See existing integration tests in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/integration/defs/disaggregated/test_disaggregated.py.
I think that this MR does not change the expected output of PD. We can reuse the existing integration tests. Does it make sense to you? @pcastonguay
/bot run
PR_Github #444 [ run ] triggered by Bot
PR_Github #444 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #379 completed with status: 'FAILURE'
Can we add a description? What is this PR fixing? Can we add unit or integration tests to verify what this PR is fixing? See existing integration tests in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/integration/defs/disaggregated/test_disaggregated.py.
I think that this MR does not change the expected output of PD. We can reuse the existing integration tests. Does it make sense to you? @pcastonguay
Yes thanks.
/bot run
PR_Github #529 [ run ] triggered by Bot
PR_Github #529 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #452 completed with status: 'FAILURE'
/bot run
PR_Github #666 [ run ] triggered by Bot
PR_Github #666 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #560 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #840 [ reuse-pipeline ] triggered by Bot
PR_Github #840 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #666 for commit 613ec08