Tommy Yang

Results 1 comments of Tommy Yang

I'm facing a similar issue when inferencing qwen-72B model. The build params used for trt is: ```bash python build.py --hf_model_dir ./Qwen-72B-chat/ \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \...