TensorRT-LLM The engine generated by each build has different results for the same input.

System Info

trt-llm v0.9.0

Who can help?

@byshiue

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

build the engine for test 1
build the engine for test 2
run above 2 engine use the same input

Expected behavior

the ouputs is same

actual behavior

the ouputs is not same

additional notes

I tried to use the model cache when building, but it didn't work.

Aug 23 '24 10:08 1096125073

Is there any way to ensure that the engine generated by build is identical?This is important for engineering deployment.

Aug 23 '24 10:08 1096125073

Can you provide more details, i.e the cmds, which can help us reproduce this issue?

Sep 04 '24 04:09 lfr-0531

Can you provide more details, i.e the cmds, which can help us reproduce this issue?

I encountered same issue. I trtllm-build 2 times with everything identical, but the inference results are slightly different between 2 models for the same input. Also I found a similar problem post by others. https://github.com/NVIDIA/TensorRT-LLM/issues/2196

Sep 06 '24 06:09 qiancheng99

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Oct 07 '24 02:10 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

Oct 23 '24 02:10 github-actions[bot]