TensorRT-LLM perf: Use pinned H2D to reduce bubbles

In some cases, some pageable H2D operations are followed by cudaStreamSynchronize operations, which block kernel launches on CPU. This problem can be solved by changing pageable H2D to pinned H2D.

Mar 29 '25 09:03 jinyangyuan-nvidia

/bot run

Mar 29 '25 09:03 jinyangyuan-nvidia

PR_Github #688 [ run ] triggered by Bot

Mar 29 '25 09:03 tensorrt-cicd

PR_Github #688 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #577 completed with status: 'SUCCESS'

Mar 29 '25 19:03 tensorrt-cicd

/bot run --add-multi-gpu-test

Apr 03 '25 08:04 jinyangyuan-nvidia

PR_Github #1096 [ run ] triggered by Bot

Apr 03 '25 08:04 tensorrt-cicd

PR_Github #1096 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #836 completed with status: 'FAILURE'

Apr 03 '25 17:04 tensorrt-cicd

/bot run --add-multi-gpu-test

Apr 04 '25 07:04 jinyangyuan-nvidia

PR_Github #1165 [ run ] triggered by Bot

Apr 04 '25 07:04 tensorrt-cicd

PR_Github #1165 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #874 completed with status: 'SUCCESS'

Apr 04 '25 11:04 tensorrt-cicd

/bot reuse-pipeline

Apr 04 '25 13:04 jinyangyuan-nvidia

PR_Github #1176 [ reuse-pipeline ] triggered by Bot

Apr 04 '25 13:04 tensorrt-cicd

PR_Github #1176 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #1165 for commit 2835f2b

Apr 04 '25 14:04 tensorrt-cicd

/bot reuse-pipeline

Apr 04 '25 14:04 jinyangyuan-nvidia

PR_Github #1178 [ reuse-pipeline ] triggered by Bot

Apr 04 '25 14:04 tensorrt-cicd

PR_Github #1178 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #1165 for commit 2efb7da

Apr 04 '25 14:04 tensorrt-cicd