perf: Enable CUDA graphs when attention DP is used and active requests on different GPUs are uneven
This PR modifies the code related to dummy requests to allow the use of CUDA graphs when attention DP is used and active requests on different GPUs are uneven.
/bot run
PR_Github #266 [ run ] triggered by Bot
PR_Github #266 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #256 completed with status: 'FAILURE'
/bot run
PR_Github #322 [ run ] triggered by Bot
PR_Github #322 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #300 completed with status: 'FAILURE'
/bot run
PR_Github #353 [ run ] triggered by Bot
PR_Github #353 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #323 completed with status: 'FAILURE'
/bot run
PR_Github #410 [ ] completed with state FAILURE
PR_Github #414 [ ] completed with state FAILURE
PR_Github #418 [ run ] triggered by Bot
PR_Github #418 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #362 completed with status: 'SUCCESS'
/bot run
PR_Github #450 [ run ] triggered by Bot
PR_Github #450 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #385 completed with status: 'FAILURE'
/bot run
PR_Github #483 [ run ] triggered by Bot
PR_Github #483 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #416 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #494 [ run ] triggered by Bot
/bot kill
PR_Github #502 [ kill ] triggered by Bot
PR_Github #502 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 802f729
/bot --help
/bot run
PR_Github #504 [ run ] triggered by Bot
PR_Github #504 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #433 completed with status: 'FAILURE'
/bot run