jessie-zhao issues

Results 6 issues of


                                            jessie-zhao

Yolov5s model int8 calibration core dump issue

hi ALL. I am running Deepstream6.1 on A10 on ubuntu20.04, when run yolov5s model with int8 calbiratio, got below issue. can someone help with this #deepstream-app -c ./deepstream_app_config.txt 。。。 Total...

New model support request

模型列表： • https://huggingface.co/Nanbeige/Nanbeige2-8B-Chat • https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat • https://huggingface.co/codellama/CodeLlama-34b-hf 测试标准 SLO: 进行并发请求测试，限制 TTFT 和 TPOT 测试最大并发 case 1: • 输入 4096 输出 1024 • TTFT: 3s, TPOT: 100ms case 2: • 输入...

user issue

multi-arc

OOM on multiple-ARC with vllm serving

Run vllm serving test on ARC with below issue: NFO 07-04 19:10:08 async_llm_engine.py:152] Aborted request cmpl-e5fb5cad96e9402dabbbece3611ae22f-0. INFO: 127.0.0.1:41772 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI...

user issue

6K input OOM on ARC with VLLM-serving

Faced OOM on Arc with 6k input/512 out with VLLM serving, Mode: ChatGLM3-bB, Qwen1.5-32B on 4 ARC

user issue

Low parallel requests on Arc with VLLM serving

Got only 10 parallel request on 2 Arc with Qwen1.5 model (1024 input/512 out), could you please to improve the performance?

user issue

Glm4-9b-inference输出错误ISSUE

用以下方式验证glm4-9b-chat模型的输出，serving端报错 curl --request POST \ --url http://127.0.0.1:8000/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "glm-4-9b-chat", "temperature": 0.7, "top_p": 0.8, "messages": [ { "role": "system", "content": "Below is an instruction...

user issue

multi-arc