ChengShuting issues

Results 4 issues of


                                            ChengShuting

关系抽取 (Relation Extraction, RE)模型不支持转onnx格式

** ![图片](https://user-images.githubusercontent.com/67726763/223937034-c7059536-3994-41f6-bf44-eb34dd6a1616.png) **

repeated answer:When I use vllm with Qwen-7b-chat the generated text is x lnot end until the maength, with the repeated answer

sampling_parameters = { "temperature": "0", "top_p": "0.5", "max_tokens": "300"} python3 client.py ![图片](https://github.com/triton-inference-server/server/assets/67726763/c7ed3236-7c47-4d55-a6d0-0c25aeb35658)

Why did I use llama2-7B when pruningthe model to the same size as the original model

How to solve the problem of errors when loading qwen1.5-7B (using two GPUs) and llama3-8B (using two GPUs) models simultaneously using tritonserver?

### System Info env: NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 V100 16G*8 docker images: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3 ### Who can help? _No response_ ### Information - [ ] The official...

bug

triaged