TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat:[AutoDeploy] E2E build example for llama4 VLM

Open Fridah-nv opened this issue 8 months ago • 50 comments

Description

Add an unit test as small build example for Llama4 MultiModal Model. Demonstrates

  1. processing image and test inputs with AutoProcessor.apply_chat_template()
  2. using torch.cond to accept both text+image input and text only inputs

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Fridah-nv avatar Apr 28 '25 21:04 Fridah-nv

/bot run

Fridah-nv avatar Apr 28 '25 21:04 Fridah-nv

PR_Github #3649 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 28 '25 21:04 tensorrt-cicd

PR_Github #3649 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2580 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 28 '25 23:04 tensorrt-cicd

/bot run

Fridah-nv avatar Apr 29 '25 00:04 Fridah-nv

PR_Github #3656 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 29 '25 00:04 tensorrt-cicd

PR_Github #3656 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2586 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 29 '25 01:04 tensorrt-cicd

/bot run

Fridah-nv avatar Apr 29 '25 02:04 Fridah-nv

PR_Github #3662 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 29 '25 02:04 tensorrt-cicd

PR_Github #3662 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2590 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 29 '25 04:04 tensorrt-cicd

/bot run

Fridah-nv avatar Apr 29 '25 04:04 Fridah-nv

PR_Github #3681 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 29 '25 04:04 tensorrt-cicd

PR_Github #3681 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2602 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 29 '25 08:04 tensorrt-cicd

/bot run

Fridah-nv avatar Apr 29 '25 16:04 Fridah-nv

PR_Github #3748 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 29 '25 16:04 tensorrt-cicd

PR_Github #3748 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2653 completed with status: 'SUCCESS'

tensorrt-cicd avatar Apr 29 '25 18:04 tensorrt-cicd

are we planning to merge this?

suyoggupta avatar May 06 '25 01:05 suyoggupta

/bot run

Fridah-nv avatar May 06 '25 06:05 Fridah-nv

PR_Github #4162 [ run ] triggered by Bot

tensorrt-cicd avatar May 06 '25 06:05 tensorrt-cicd

PR_Github #4162 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2974 completed with status: 'FAILURE'

tensorrt-cicd avatar May 06 '25 09:05 tensorrt-cicd

are we planning to merge this?

It's good to merge once the CI passes, right now there's one test_ad_build_small failure that appears with my change.

Fridah-nv avatar May 06 '25 15:05 Fridah-nv

/bot run

Fridah-nv avatar May 06 '25 17:05 Fridah-nv

PR_Github #4251 [ run ] triggered by Bot

tensorrt-cicd avatar May 06 '25 17:05 tensorrt-cicd

PR_Github #4251 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3039 completed with status: 'SUCCESS'

tensorrt-cicd avatar May 06 '25 19:05 tensorrt-cicd

/bot run --disable-fail-fast --stage-list "DGX_H100-4_GPUs-PyTorch-[Post-Merge]"

Fridah-nv avatar May 06 '25 19:05 Fridah-nv

PR_Github #4255 [ run ] triggered by Bot

tensorrt-cicd avatar May 06 '25 20:05 tensorrt-cicd

PR_Github #4255 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3042 (Partly Tested) completed with status: 'SUCCESS'

tensorrt-cicd avatar May 06 '25 22:05 tensorrt-cicd

/bot run --post-merge --disable-fail-fast

Fridah-nv avatar May 07 '25 05:05 Fridah-nv

PR_Github #4317 [ run ] triggered by Bot

tensorrt-cicd avatar May 07 '25 05:05 tensorrt-cicd

PR_Github #4317 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3093 completed with status: 'ABORTED'

tensorrt-cicd avatar May 07 '25 06:05 tensorrt-cicd

/bot run --disable-fail-fast --stage-list "DGX_H100-4_GPUs-PyTorch-[Post-Merge]"

Fridah-nv avatar May 07 '25 14:05 Fridah-nv