The engine generated by each build has different results for the same input.
System Info
trt-llm v0.9.0
Who can help?
@byshiue
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- build the engine for test 1
- build the engine for test 2
- run above 2 engine use the same input
Expected behavior
the ouputs is same
actual behavior
the ouputs is not same
additional notes
I tried to use the model cache when building, but it didn't work.
Is there any way to ensure that the engine generated by build is identical?This is important for engineering deployment.
Can you provide more details, i.e the cmds, which can help us reproduce this issue?
Can you provide more details, i.e the cmds, which can help us reproduce this issue?
I encountered same issue. I trtllm-build 2 times with everything identical, but the inference results are slightly different between 2 models for the same input. Also I found a similar problem post by others. https://github.com/NVIDIA/TensorRT-LLM/issues/2196
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
This issue was closed because it has been stalled for 15 days with no activity.