DeepSpeed-MII issues

Results 149 DeepSpeed-MII issues

Sort by recently updated

Serving error when input of large length is sent

Hi, I am serving using MII for model llama-2-7b-hf with tensor-parallel parameter 1. When the input is not very long, the output can be generated properly. However, when the length...

frankxyy

Illegal memory access error when infering input of length 100K

Hi，I served this model from huggingface: 01-ai/Yi-6B-200K. When requesting for input of length 100K，this error occurs:

frankxyy

Is there a way for mii to not occupy all the available gpu memory

When I run examples in [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/blob/master/benchmarks/inference/mii/server.py) to start a server, it occupy all the GPU memory at the beginning. Is it possible to config the max gpu memory that it...

flexwang

Is there gonna be metrics endpoint exposed?

flexwang

What is the recommended way of bringing up mii as a service

My understanding is that we have to build a fastAPI wrapper, and during intialized phase we call `client = mii.client("mistralai/Mistral-7B-v0.1")` and we implement a handler to call `client.generate`.

flexwang

Is beam search supported?

flexwang

Unable to load relatively large opt models (opt-6.7b opt-30b)

Hi everyone, I am new to DeepSpeed MII, and I have just made several attempts according to `pipeline.py` in the provided examples. Everything works fine initially with small models, such...

MeloYang05

Support for token streaming

Thank you for your hard work. I am really excited about MII performance. I have some questions Does token streaming function supported now? If token streaming is supported, I would...

Archmilio

tp > 1 inference is very slow

@mrwyattii Use latest main branch and test model is llamav2-7b. When I use tp=4 to test a single sentence inference, it costs 267.98s, but when tp=1, it costs 7s to...

easonfzw

Request to support additional model architectures

Please add support for **Mosaic MPT** models and **some other architectures with less than 1b parameters.** Also, it would be great if there can be some instructions how someone can...

sumitsahaykoantek

DeepSpeed-MII
DeepSpeed-MII copied to clipboard

Metadata

Serving error when input of large length is sent

Illegal memory access error when infering input of length 100K

Is there a way for mii to not occupy all the available gpu memory

Is there gonna be metrics endpoint exposed?

What is the recommended way of bringing up mii as a service

Is beam search supported?

Unable to load relatively large opt models (opt-6.7b opt-30b)

Support for token streaming

tp > 1 inference is very slow

Request to support additional model architectures

← Metadata

Owner

Metadata

DeepSpeed-MII DeepSpeed-MII copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeed-MII
DeepSpeed-MII copied to clipboard