Tommy Yang comments

Repositories
Issues
Comments

Results 1 comments of


                                            Tommy Yang

Issue with token number: how to increase processed input tokens: models llama and phi, with 4 GPUs.

I'm facing a similar issue when inferencing qwen-72B model. The build params used for trt is: ```bash python build.py --hf_model_dir ./Qwen-72B-chat/ \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \...