TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

There are differences in the results of Qwen2-7B Instruction

Open skyCreateXian opened this issue 7 months ago • 0 comments

System Info

GPU:L20 Tensorrt-LLM:v0.11.0 transformers: 4.42.0

Who can help?

@ncomly-nvidia @kaiyux prompt='你好,请介绍一下喜马拉雅山的详细信息'

1、transformers

about params: generation_config = GenerationConfig( top_k=1, temperature=1, max_length=2048, max_new_tokens=80, repetition_penalty=1.0, early_stopping=True, do_sample=True, num_beams=1, top_p=1, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id ) transformers result: ` 喜马拉雅山(Himalayas)是地球上最高的山脉,位于亚洲南部,横跨中国、印度、尼泊尔、不丹、巴基斯坦和阿富汗等国家。以下是关于喜马拉雅山的一些详细信息:

地理位置与范围

喜马拉雅山脉从中国西藏的喜马拉雅山脉开始,向南延伸至印度的喜马拉雅山脉,, 128 `

2、Tensorrt-LLM

about params: batch_input_ids=input_ids, max_new_tokens=80, end_id=tokenizer.eos_token_id, pad_id=tokenizer.pad_token_id, top_k=1 Tensorrt-LLM result: ` 你好!喜马拉雅山(Himalayas)是地球上最壮观的山脉之一,位于亚洲南部,横跨中国、印度、尼泊尔、不丹、巴基斯坦和阿富汗等国家。以下是关于喜马拉雅山的一些详细信息:

地理位置与范围

喜马拉雅山脉从中国西藏的喜马拉雅山脉开始,向南延伸至印度的 `

3、how to create input_ids?

` prompt='你好,请介绍一下喜马拉雅山的详细信息' messages = [{"role": "user", "content": prompt}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) input_ids = tokenizer(prompt, truncation=True, return_tensors="pt", add_special_tokens=False)['input_ids'] `

4、build Qwen2-7B engine

` python convert_checkpoint.py --model_dir /mnt/qwen2/Qwen2-7B-Instruct
--output_dir checkpoint
--dtype float16

trtllm-build --checkpoint_dir ./checkpoint
--output_dir ./fp16
--gemm_plugin float16 `

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

  1. Test transformers and Tensorrt-LLM results separately using the same input
  2. Comparing the generated token and prompt will reveal differences

Expected behavior

1、I hope qwen2 can be perfectly aligned

actual behavior

1、There are some differences in the results 2、Tested many cases, with approximately 5-10% not fully aligned

additional notes

Nothing

skyCreateXian avatar Jul 26 '24 07:07 skyCreateXian