TensorRT-LLM qwen2（7b） generate duplicate text

model: Qwen2-7B trt-llm: Engine version 0.13.0.dev2024082000 config: "temperature":0.1, "top_k":50, "top_p":0.9,", "stream":false

output: 来听听Radiohead的《High and Dry》，high一点的歌，让你的心情high起来！\n歌曲信息：《High and Dry》Radiohead；\n歌曲描述：无。\n来听听Radiohead的《High and Dry》，high一点的歌，让你的心情high起来！\n歌曲信息：《High and Dry》Radiohead；\n歌曲描述：无。\n来听听Radiohead的《High and Dry》，high一点的歌，让你的心情high起来！\n歌曲信息：《High and Dry》Radiohead；\n歌曲描述：无。\n来听听Radiohead的《High and Dry》，high一点的歌，让你的心情high起来！\n歌曲信息：《High and Dry》Radiohead；\n歌曲描述：无。\n来听听Radiohead的《High and Dry》，high一点的歌，让你的心情high起来！\n歌曲信息：《High and Dry》Radiohead；\n歌曲描述：无。\n来听听Radiohead的《High and Dry》，high一点的歌，让你的心情high起来！\n歌曲信息：《High and Dry》Radiohead；\n歌曲描述：无。\n来听听Radiohead的《High and Dry》，high一点的歌，让你的心情high起来！\n歌曲信息：《High and Dry》Radiohead；\n歌曲描述：无。\n来听听Radiohead的《High and Dry》，high一点的歌，让你的心情

why trt-llm generate duplicate text？Is it due to incorrect parameter settings？

thanks

Sep 26 '24 09:09 w066650

是不是 prompt 没有设置对

Sep 27 '24 01:09 zhangts20

在其他的平台是可以的，在trt-llm有什么特需的要求吗？

Sep 27 '24 01:09 w066650

在其他的平台是可以的，在trt-llm有什么特需的要求吗？

是不是 repetition_penalty 的默认值不一样呢，我看 trt-llm 默认是 1.0

Sep 27 '24 02:09 zhangts20

repetition_penalty

auto conf = texec::SamplingConfig{1}; conf.setRepetitionPenalty(1.0); conf.setPresencePenalty(0.0); conf.setFrequencyPenalty(0.0); 我设置的就是1.0；还有就是你们的batch_manager和executor 能否提供个源码给我？多谢

Sep 27 '24 02:09 w066650

repetition_penalty

auto conf = texec::SamplingConfig{1}; conf.setRepetitionPenalty(1.0); conf.setPresencePenalty(0.0); conf.setFrequencyPenalty(0.0); 我设置的就是1.0；还有就是你们的batch_manager和executor 能否提供个源码给我？多谢

默认 1.0 不会惩罚重复输出的，需要设置一个大于 1.0 的数，你看下其他平台的默认是多少呢有没有对齐。还有我不是 trt-llm 团队的人哈

Sep 27 '24 02:09 zhangts20

repetition_penalty

auto conf = texec::SamplingConfig{1}; conf.setRepetitionPenalty(1.0); conf.setPresencePenalty(0.0); conf.setFrequencyPenalty(0.0); 我设置的就是1.0；还有就是你们的batch_manager和executor 能否提供个源码给我？多谢

默认 1.0 不会惩罚重复输出的，需要设置一个大于 1.0 的数，你看下其他平台的默认是多少呢有没有对齐。还有我不是 trt-llm 团队的人哈

@zhangts20 多谢

还有个问题咨询下： [TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 39.39 GiB, available: 20.58 GiB [TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 664

你知道为撒20G显存为撒才分配了600多个blocks吗？看不到源码也不知道它是咋分配的

Sep 27 '24 03:09 w066650

repetition_penalty

auto conf = texec::SamplingConfig{1}; conf.setRepetitionPenalty(1.0); conf.setPresencePenalty(0.0); conf.setFrequencyPenalty(0.0); 我设置的就是1.0；还有就是你们的batch_manager和executor 能否提供个源码给我？多谢

默认 1.0 不会惩罚重复输出的，需要设置一个大于 1.0 的数，你看下其他平台的默认是多少呢有没有对齐。还有我不是 trt-llm 团队的人哈

@zhangts20 多谢

还有个问题咨询下： [TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 39.39 GiB, available: 20.58 GiB [TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 664

你知道为撒20G显存为撒才分配了600多个blocks吗？看不到源码也不知道它是咋分配的

我理解这个和你 build 的参数有关，可以看看 python 的 runtime https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L1512

Sep 27 '24 06:09 zhangts20

@zhangts20 hi 还咨询下：trt-llm 生成文本的时候，偶偶会出现问题，不知道撒情况，在llama上就比较稳定

正常输出：“******************************************！”

偶偶输出："以下是根据用户输入生成的：\n\n******************************************！\n\n注意：以上推荐内容是根据用户输入生成的，可能与实际情况有所出入。请根据实际情况进行调整。"

偶偶出现这种多余的字，prompt时候明确指出了，不要输出上下文内容，但是偶偶还是有这些上下文；感觉指令遵循不好，不知道是不是撒参数设置的问题

Sep 27 '24 07:09 w066650

@zhangts20 hi 还咨询下：trt-llm 生成文本的时候，偶偶会出现问题，不知道撒情况，在llama上就比较稳定

正常输出：“******************************************！”

偶偶输出："以下是根据用户输入生成的：\n\n******************************************！\n\n注意：以上推荐内容是根据用户输入生成的，可能与实际情况有所出入。请根据实际情况进行调整。"

偶偶出现这种多余的字，prompt时候明确指出了，不要输出上下文内容，但是偶偶还是有这些上下文；感觉指令遵循不好，不知道是不是撒参数设置的问题

关闭采样测试一下吧，多半就是后处理的问题，trtllm 是和 python 的结果有 logits 方面的误差很正常，但是在输出文本上不太应该有差别

Sep 28 '24 01:09 zhangts20

Close due to no recent update, please feel free to reopen it.

Oct 15 '24 02:10 Superjomn