[bug] SamplingConfig no_repeat_ngram_size=0 trigger a bug
- TensorRT-LLM version: 0.17.0.post1
- https://nvidia.github.io/TensorRT-LLM/_modules/tensorrt_llm/runtime/generation.html#SamplingConfig
- SamplingConfig
no_repeat_ngram_size=0trigger a bug
│ [TensorRT-LLM][ERROR] Assertion failed: no_repeat_ngram_size param (0.000000) is out of limits (0.000000, 340282346638528859811704183484516925440.000000] │
│ (/home/jenkins/agent/workspace/LLM/release-0.17/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/layers/layerUtils.h:65) │
│ 1 0x2aad620c7045 tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 83 │
│ 2 0x2aad62376b0c void tensorrt_llm::layers::FillBuffers::operator()<int>(std::optional<std::vector<int, std::allocator<int> > > const&, int, │
│ std::shared_ptr<tensorrt_llm::runtime::IBuffer>, std::shared_ptr<tensorrt_llm::runtime::IBuffer>, std::shared_ptr<tensorrt_llm::runtime::IBuffer const>, │
│ std::pair<float, float> const&, std::string const&) const + 716 │
│ 3 0x2aad62377013 tensorrt_llm::layers::BanWordsLayer<float>::setup(int, int, std::shared_ptr<tensorrt_llm::runtime::ITensor const>, │
│ std::shared_ptr<tensorrt_llm::layers::BaseSetupParams> const&, std::shared_ptr<tensorrt_llm::runtime::DecodingLayerWorkspace> const&) + 547 │
│ 4 0x2aad62396977 tensorrt_llm::layers::DynamicDecodeLayer<float>::setup(int, int, std::shared_ptr<tensorrt_llm::runtime::ITensor const>, │
│ std::shared_ptr<tensorrt_llm::layers::BaseSetupParams> const&, std::shared_ptr<tensorrt_llm::runtime::DecodingLayerWorkspace> const&) + 487 │
│ 5 0x2aae1f6b860f torch_ext::FtDynamicDecode<float>::setup(unsigned long, unsigned long, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, bool, bool) + 1663 │
│ 6 0x2aae1f6a059b torch_ext::DynamicDecodeOp::setup(long, long, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, bool, bool) + 971 │
│ 7 0x2aae1f6b31a7 /usr/local/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(+0xb31a7) [0x2aae1f6b31a7] │
│ 8 0x2aae1f6b3bf9 std::_Function_handler<void (std::vector<c10::IValue, std::allocator<c10::IValue> >&), │
│ torch::class_<torch_ext::DynamicDecodeOp>::defineMethod<torch::detail::WrapMethod<void (torch_ext::DynamicDecodeOp::*)(long, long, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, bool, bool)> >(std::string, │
│ torch::detail::WrapMethod<void (torch_ext::DynamicDecodeOp::*)(long, long, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, std::optional<at::Tensor>, │
│ std::optional<at::Tensor>, std::optional<at::Tensor>, bool, bool)>, std::string, std::initializer_list<torch::arg>)::{lambda(std::vector<c10::IValue, │
│ std::allocator<c10::IValue> >&)#1}>::_M_invoke(std::_Any_data const&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 9 │
│ 9 0x2aab868af6de /usr/local/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xaaf6de) [0x2aab868af6de] │
│ 10 0x2aab86990551 /usr/local/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xb90551) [0x2aab86990551] │
│ 11 0x2aab86951541 /usr/local/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xb51541) [0x2aab86951541] │
│ 12 0x2aab862cb474 /usr/local/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x4cb474) [0x2aab862cb474] │
│ 13 0x2aab78ca329a /usr/local/bin/../lib/libpython3.10.so.1.0(+0x2a329a) [0x2aab78ca329a] │
│ 14 0x2aab78b61f4e _PyObject_MakeTpCall + 318 │
│ 15 0x2aab78c83a85 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x283a85) [0x2aab78c83a85] │
│ 16 0x2aab78dadcc5 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x3adcc5) [0x2aab78dadcc5] │
│ 17 0x2aab78ce8a6e _PyEval_EvalFrameDefault + 70846 │
│ 18 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 19 0x2aab78cdfef0 _PyEval_EvalFrameDefault + 35136 │
│ 20 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 21 0x2aab78d05c72 _PyEval_EvalFrameDefault + 190146 │
│ 22 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 23 0x2aab78c8384b /usr/local/bin/../lib/libpython3.10.so.1.0(+0x28384b) [0x2aab78c8384b] │
│ 24 0x2aab78d05c72 _PyEval_EvalFrameDefault + 190146 │
│ 25 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 26 0x2aab78c8384b /usr/local/bin/../lib/libpython3.10.so.1.0(+0x28384b) [0x2aab78c8384b] │
│ 27 0x2aab78d05fcc /usr/local/bin/../lib/libpython3.10.so.1.0(+0x305fcc) [0x2aab78d05fcc] │
│ 28 0x2aab78cdb1a4 _PyEval_EvalFrameDefault + 15348 │
│ 29 0x2aab78d818b1 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x3818b1) [0x2aab78d818b1] │
│ 30 0x2aab78d81c87 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x381c87) [0x2aab78d81c87] │
│ 31 0x2aab78b69b37 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x169b37) [0x2aab78b69b37] │
│ 32 0x2aab78cdc1f4 _PyEval_EvalFrameDefault + 19524 │
│ 33 0x2aab78d818b1 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x3818b1) [0x2aab78d818b1] │
│ 34 0x2aab78ce7b00 _PyEval_EvalFrameDefault + 66896 │
│ 35 0x2aab78d818b1 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x3818b1) [0x2aab78d818b1] │
│ 36 0x2aab78bc8c4c /usr/local/bin/../lib/libpython3.10.so.1.0(+0x1c8c4c) [0x2aab78bc8c4c] │
│ 37 0x2aab78bc8a29 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x1c8a29) [0x2aab78bc8a29] │
│ 38 0x2aab78bc94d2 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x1c94d2) [0x2aab78bc94d2] │
│ 39 0x2aab78ca2ee0 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x2a2ee0) [0x2aab78ca2ee0] │
│ 40 0x2aab78bac359 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x1ac359) [0x2aab78bac359] │
│ 41 0x2aab78bac243 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x1ac243) [0x2aab78bac243] │
│ 42 0x2aab78ca2d35 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x2a2d35) [0x2aab78ca2d35] │
│ 43 0x2aab78ce22b9 _PyEval_EvalFrameDefault + 44297 │
│ 44 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 45 0x2aab78cdfef0 _PyEval_EvalFrameDefault + 35136 │
│ 46 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 47 0x2aab78cdfef0 _PyEval_EvalFrameDefault + 35136 │
│ 48 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 49 0x2aab78cdfef0 _PyEval_EvalFrameDefault + 35136 │
│ 50 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 51 0x2aab78cdfef0 _PyEval_EvalFrameDefault + 35136 │
│ 52 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 53 0x2aab78cdfef0 _PyEval_EvalFrameDefault + 35136 │
│ 54 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 55 0x2aab78d05fcc /usr/local/bin/../lib/libpython3.10.so.1.0(+0x305fcc) [0x2aab78d05fcc] │
│ 56 0x2aab78cdb69d _PyEval_EvalFrameDefault + 16621 │
│ 57 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 58 0x2aab78d05fcc /usr/local/bin/../lib/libpython3.10.so.1.0(+0x305fcc) [0x2aab78d05fcc] │
│ 59 0x2aab78cdb69d _PyEval_EvalFrameDefault + 16621 │
│ 60 0x2aab78ddf523 /usr/local/bin/../lib/libpython3.10.so.1.0(+0x3df523) [0x2aab78ddf523] │
│ 61 0x2aab78ddd12e /usr/local/bin/../lib/libpython3.10.so.1.0(+0x3dd12e) [0x2aab78ddd12e] │
│ 62 0x2aab78ca2c5b /usr/local/bin/../lib/libpython3.10.so.1.0(+0x2a2c5b) [0x2aab78ca2c5b] │
│ 63 0x2aab78d05fcc /usr/local/bin/../lib/libpython3.10.so.1.0(+0x305fcc) [0x2aab78d05fcc] │
│ 64 0x2aab78cdb69d _PyEval_EvalFrameDefault + 16621 │
│ 65 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 66 0x2aab78d05fcc /usr/local/bin/../lib/libpython3.10.so.1.0(+0x305fcc) [0x2aab78d05fcc] │
│ 67 0x2aab78cdb69d _PyEval_EvalFrameDefault + 16621 │
│ 68 0x2aab78c8170c _PyFunction_Vectorcall + 540 │
│ 69 0x2aab78bc31bf /usr/local/bin/../lib/libpython3.10.so.1.0(+0x1c31bf) [0x2aab78bc31bf] │
│ 70 0x2aab78e225d6 Py_RunMain + 1750 │
│ 71 0x2aab78bc3b37 Py_BytesMain + 39 │
│ 72 0x2aab79e29d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x2aab79e29d90] │
│ 73 0x2aab79e29e40 __libc_start_main + 128 │
│ 74 0x55cf8ee2908e python(+0x108e) [0x55cf8ee2908e]
Thanks for reporting this, @weedge. I will follow with my colleagues.
June
@weedge
To confirm my understanding, you specify no_repeat_ngram_size to be 0 since you want to disable the repeat penalty?
If so, does specifying no_repeat_ngram_size to be 1 can also achieve your needs?
June
@juney-nvidia hi, maybe want to disable repeat penalty;
use transformers GenerationConfig to test, like this: (GenerationConfig have no_repeat_ngram_size=0)
generation_config = GenerationConfig.from_pretrained(
model, "generation_config.json"
).to_dict()
then to update tensorrt-llm SamplingConfig
sampling_config.update(**generation_config)
when generate use this sampling_config
@weedge In tensorrt-llm, if you want to disable the repeat penalty, you don't need to setup it and it is None by default.
@weedge In tensorrt-llm, if you want to disable the repeat penalty, you don't need to setup it and it is None by default.
ok, want to use GenerationConfig to save user defined config, boundary case maybe need fix