ChengShuting

Results 4 issues of ChengShuting

** ![图片](https://user-images.githubusercontent.com/67726763/223937034-c7059536-3994-41f6-bf44-eb34dd6a1616.png) **

sampling_parameters = { "temperature": "0", "top_p": "0.5", "max_tokens": "300"} python3 client.py ![图片](https://github.com/triton-inference-server/server/assets/67726763/c7ed3236-7c47-4d55-a6d0-0c25aeb35658)

Why did I use llama2-7B when pruningthe model to the same size as the original model

### System Info env: NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 V100 16G*8 docker images: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3 ### Who can help? _No response_ ### Information - [ ] The official...

bug
triaged