TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

test: Add gpqa tests for DeepSeek models

Open lfr-0531 opened this issue 9 months ago • 8 comments

  • Add gpqa accuracy test script
  • Add gpqa accuracy tests
  • Update DeepSeek-v3 doc
  • Update qa test list

lfr-0531 avatar Mar 25 '25 10:03 lfr-0531

/bot run

lfr-0531 avatar Mar 25 '25 10:03 lfr-0531

PR_Github #427 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 10:03 niukuo

@LarryXFly for vis since this can introduce new test cases for QA pipeline.

June

juney-nvidia avatar Mar 25 '25 11:03 juney-nvidia

@lfr-0531 BTW, I notice that the GPQA test will be run for at least twice, how long it take to finish a single round of GPQA evaluation in this setting?

June

juney-nvidia avatar Mar 25 '25 11:03 juney-nvidia

@lfr-0531 BTW, I notice that the GPQA test will be run for at least twice, how long it take to finish a single round of GPQA evaluation in this setting?

June

Considering the running time, currently we only run it once. For the running time, if enabled MTP, it should stop after 20~30mins. But without MTP, we may need around 1h to finish the test.

lfr-0531 avatar Mar 25 '25 11:03 lfr-0531

PR_Github #427 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #366 completed with status: 'SUCCESS'

niukuo avatar Mar 25 '25 13:03 niukuo

cc @syuoni for vis, let's consider supporting gpqa task in the ongoing accuracy suite.

QiJune avatar Mar 26 '25 00:03 QiJune

cc @syuoni for vis, let's consider supporting gpqa task in the ongoing accuracy suite.

As more accuracy evaluation tasks are added to TRT-LLM, it seems more necessary to have them unified in a single entrypoint. cc @byshiue for vis.

syuoni avatar Mar 26 '25 01:03 syuoni

/bot run

lfr-0531 avatar Mar 26 '25 16:03 lfr-0531

PR_Github #602 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 16:03 niukuo

PR_Github #602 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #511 completed with status: 'SUCCESS'

niukuo avatar Mar 26 '25 19:03 niukuo

/bot run

lfr-0531 avatar Mar 27 '25 05:03 lfr-0531

PR_Github #641 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 27 '25 05:03 tensorrt-cicd

/bot kill

lfr-0531 avatar Mar 27 '25 05:03 lfr-0531

/bot run

lfr-0531 avatar Mar 27 '25 05:03 lfr-0531

PR_Github #644 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 27 '25 05:03 tensorrt-cicd

PR_Github #641 [ run ] completed with state ABORTED

tensorrt-cicd avatar Mar 27 '25 05:03 tensorrt-cicd

PR_Github #644 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #543 completed with status: 'SUCCESS'

tensorrt-cicd avatar Mar 27 '25 07:03 tensorrt-cicd

/bot run

lfr-0531 avatar Mar 27 '25 09:03 lfr-0531

PR_Github #650 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 27 '25 09:03 tensorrt-cicd

PR_Github #650 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #549 completed with status: 'SUCCESS'

tensorrt-cicd avatar Mar 27 '25 11:03 tensorrt-cicd

/bot reuse-pipeline

lfr-0531 avatar Mar 27 '25 11:03 lfr-0531

PR_Github #652 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Mar 27 '25 11:03 tensorrt-cicd

PR_Github #652 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #650 for commit 6d15d8b

tensorrt-cicd avatar Mar 27 '25 11:03 tensorrt-cicd