DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Results 149 DeepSpeed-MII issues
Sort by recently updated
recently updated
newest added

Could you please support LLava next stronger? https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/

I used the same model and the same input, and found that the results obtained by the two inference frameworks vllm and deepspeed mii were inconsistent. I need to configure...

Hello, i get this error when trying to start a server with tf32 model NotImplementedError: Only fp16 and bf16 are supported Any idea or workaround? Thank you

Now that token streaming support has merged (#397), we can enable streaming response in the OpenAI RESTful API endpoint. This PR * adds missing package dependencies for OpenAI API server...

I use transformers pipeline to generate json dictionaries and I need to specify a prefix_allowed_tokens_fn such that the tokens that can be generated at some steps are fixed. By looking...

LLAMA-3 is one of the most popular models. Please support it.

Do you have plans to support Mixtral-8x22B?

https://github.com/microsoft/DeepSpeed-MII/blob/d5468112bffe2b93228bb9f6f16aef84029a3d30/mii/batching/postprocess.py#L39-L41 Here may mean ```python idx_list.extend(unprocessed_idx) ``` Otherwise, it may lead to following error ```shell Exception in thread Thread-1: Traceback (most recent call last): File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner...