zhaotyer

Results 7 issues of zhaotyer

**Description** When I use tritonserever22.02 for dynamic batch inference, the coredump will occasionally appear in the first inference after the model is loaded successfully ![微信图片_20220816102231](https://user-images.githubusercontent.com/89376832/184784906-60d5b341-aceb-48b1-a030-3e4acc828083.png) ![微信图片_20220816102242](https://user-images.githubusercontent.com/89376832/184784978-c611239e-362a-489a-8472-5ff081f4ae98.png) **Triton Information** nvcr.io/nvidia/tritonserver:22.02-py3 Are...

bug
investigating

I tried to integrate mii into tritonserver, but encountered some problems Below is part of my code ``` class TritonPythonModel: def initialize(self, args): import mii from transformers import AutoTokenizer tensor_parallel_size...

Test environment 1*A100*80G | vllm==0.2.6+cu118 | deepspeed-mii==0.2.0 | Llama-2-7b-chat-hf script:[https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii](url) Test Result: ![微信图片_20240130141631](https://github.com/microsoft/DeepSpeed-MII/assets/89376832/e71f537c-908c-43c7-8dd4-8347b7b67541) Why is the performance lower than vllm?

### Your current environment ```text The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.3.0+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build...

bug

### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...

### Proposal to improve performance _No response_ ### Report of performance regression _No response_ ### Misc discussion on performance vllm command `python3 -m vllm.entrypoints.openai.api_server --model ${model_path} --port 8108 --max-model-len 6500...

performance

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build...

bug