DeepSpeed-MII
DeepSpeed-MII copied to clipboard
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Could you please support LLava next stronger? https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/
I used the same model and the same input, and found that the results obtained by the two inference frameworks vllm and deepspeed mii were inconsistent. I need to configure...
Hello, i get this error when trying to start a server with tf32 model NotImplementedError: Only fp16 and bf16 are supported Any idea or workaround? Thank you
Now that token streaming support has merged (#397), we can enable streaming response in the OpenAI RESTful API endpoint. This PR * adds missing package dependencies for OpenAI API server...
I use transformers pipeline to generate json dictionaries and I need to specify a prefix_allowed_tokens_fn such that the tokens that can be generated at some steps are fixed. By looking...
LLAMA-3 is one of the most popular models. Please support it.
Do you have plans to support Mixtral-8x22B?
https://github.com/microsoft/DeepSpeed-MII/blob/d5468112bffe2b93228bb9f6f16aef84029a3d30/mii/batching/postprocess.py#L39-L41 Here may mean ```python idx_list.extend(unprocessed_idx) ``` Otherwise, it may lead to following error ```shell Exception in thread Thread-1: Traceback (most recent call last): File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner...