DeepSpeed-MII issues

Results 149 DeepSpeed-MII issues

Sort by recently updated

Support LLava next stronger

Could you please support LLava next stronger? https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/

How can I use the same prompt to produce the same text output as vllm

I used the same model and the same input, and found that the results obtained by the two inference frameworks vllm and deepspeed mii were inconsistent. I need to configure...

Greatpanc

Tf32 support

Hello, i get this error when trying to start a server with tf32 model NotImplementedError: Only fp16 and bf16 are supported Any idea or workaround? Thank you

Chasapas

Enable streaming option in the OpenAI API server

Now that token streaming support has merged (#397), we can enable streaming response in the OpenAI RESTful API endpoint. This PR * adds missing package dependencies for OpenAI API server...

adk9

DeepSpeed-MII 能加载量化的int4或者int8的模型吗？

wangyongpenga

Does deepspeed-mii support prefix_allowed_tokens_fn?

I use transformers pipeline to generate json dictionaries and I need to specify a prefix_allowed_tokens_fn such that the tokens that can be generated at some steps are fixed. By looking...

zcakzhuu

[REQUEST] LLAMA-3 support

LLAMA-3 is one of the most popular models. Please support it.

MRYingLEE

[REQUEST] Mixtral-8x22B support

Do you have plans to support Mixtral-8x22B?

y-live-koba

https://github.com/microsoft/DeepSpeed-MII/blob/d5468112bffe2b93228bb9f6f16aef84029a3d30/mii/batching/postprocess.py#L39-L41 Here may mean ```python idx_list.extend(unprocessed_idx) ``` Otherwise, it may lead to following error ```shell Exception in thread Thread-1: Traceback (most recent call last): File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner...

zhihui96

DeepSpeed-MII
DeepSpeed-MII copied to clipboard

Metadata

Support LLava next stronger

How can I use the same prompt to produce the same text output as vllm

Tf32 support

Enable streaming option in the OpenAI API server

DeepSpeed-MII 能加载量化的int4或者int8的模型吗？

Does deepspeed-mii support prefix_allowed_tokens_fn?

[REQUEST] LLAMA-3 support

[REQUEST] Mixtral-8x22B support

BUG in run_batch_processing

← Metadata

Owner

Metadata

DeepSpeed-MII DeepSpeed-MII copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeed-MII
DeepSpeed-MII copied to clipboard