feat: update allreduce benchmark
This MR changes the current allreduce benchmark from TRT flow to PyTorch flow with cuda graph + norm fusion support
/bot run
/bot run
PR_Github #239 [ run ] triggered by Bot
PR_Github #241 [ run ] triggered by Bot
PR_Github #239 [ run ] completed with state ABORTED
/bot run
PR_Github #252 [ run ] triggered by Bot
PR_Github #241 [ run ] completed with state ABORTED
/bot run
PR_Github #280 [ run ] triggered by Bot
PR_Github #252 [ run ] completed with state ABORTED
PR_Github #280 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #270 completed with status: 'FAILURE'
/bot run
PR_Github #324 [ run ] triggered by Bot
PR_Github #324 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #304 completed with status: 'FAILURE'
/bot run
close due to out of sync