fix: Reverse graph size order
During experiment, during cuda graph capture, the graph size oscillates frequently, making total size of graph larger than expected. Reverse the order of graph batch size when capturing and this will make the smaller batch size graph reuse memory used in larger batch size.
/bot run
PR_Github #626 [ run ] triggered by Bot
PR_Github #626 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #527 completed with status: 'FAILURE'
/bot run
PR_Github #799 [ run ] triggered by Bot
PR_Github #799 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #645 completed with status: 'FAILURE'
/bot run
PR_Github #807 [ run ] triggered by Bot
PR_Github #807 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #653 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #832 [ reuse-pipeline ] triggered by Bot
PR_Github #832 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #807 for commit f08a5ad