TensorRT-LLM test: Add gpqa tests for DeepSeek models

Add gpqa accuracy test script
Add gpqa accuracy tests
Update DeepSeek-v3 doc
Update qa test list

Mar 25 '25 10:03 lfr-0531

/bot run

Mar 25 '25 10:03 lfr-0531

PR_Github #427 [ run ] triggered by Bot

Mar 25 '25 10:03 niukuo

@LarryXFly for vis since this can introduce new test cases for QA pipeline.

June

Mar 25 '25 11:03 juney-nvidia

@lfr-0531 BTW, I notice that the GPQA test will be run for at least twice, how long it take to finish a single round of GPQA evaluation in this setting?

June

Mar 25 '25 11:03 juney-nvidia

@lfr-0531 BTW, I notice that the GPQA test will be run for at least twice, how long it take to finish a single round of GPQA evaluation in this setting?

June

Considering the running time, currently we only run it once. For the running time, if enabled MTP, it should stop after 20~30mins. But without MTP, we may need around 1h to finish the test.

Mar 25 '25 11:03 lfr-0531

PR_Github #427 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #366 completed with status: 'SUCCESS'

Mar 25 '25 13:03 niukuo

cc @syuoni for vis, let's consider supporting gpqa task in the ongoing accuracy suite.

Mar 26 '25 00:03 QiJune

cc @syuoni for vis, let's consider supporting gpqa task in the ongoing accuracy suite.

As more accuracy evaluation tasks are added to TRT-LLM, it seems more necessary to have them unified in a single entrypoint. cc @byshiue for vis.

Mar 26 '25 01:03 syuoni

/bot run

Mar 26 '25 16:03 lfr-0531

PR_Github #602 [ run ] triggered by Bot

Mar 26 '25 16:03 niukuo

PR_Github #602 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #511 completed with status: 'SUCCESS'

Mar 26 '25 19:03 niukuo

/bot run

Mar 27 '25 05:03 lfr-0531

PR_Github #641 [ run ] triggered by Bot

Mar 27 '25 05:03 tensorrt-cicd

/bot kill

Mar 27 '25 05:03 lfr-0531

/bot run

Mar 27 '25 05:03 lfr-0531

PR_Github #644 [ run ] triggered by Bot

Mar 27 '25 05:03 tensorrt-cicd

PR_Github #641 [ run ] completed with state ABORTED

Mar 27 '25 05:03 tensorrt-cicd

PR_Github #644 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #543 completed with status: 'SUCCESS'

Mar 27 '25 07:03 tensorrt-cicd

/bot run

Mar 27 '25 09:03 lfr-0531

PR_Github #650 [ run ] triggered by Bot

Mar 27 '25 09:03 tensorrt-cicd

PR_Github #650 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #549 completed with status: 'SUCCESS'

Mar 27 '25 11:03 tensorrt-cicd

/bot reuse-pipeline

Mar 27 '25 11:03 lfr-0531

PR_Github #652 [ reuse-pipeline ] triggered by Bot

Mar 27 '25 11:03 tensorrt-cicd

PR_Github #652 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #650 for commit 6d15d8b

Mar 27 '25 11:03 tensorrt-cicd