DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Results 126 DeepSpeed-MII issues
Sort by recently updated
recently updated
newest added

New `microsoft/bloom-deepspeed-inference-fp16` and `microsoft/bloom-deepspeed-inference-int8` weights not working with DeepSpeed MII @jeffra @RezaYazdaniAminabadi ``` Traceback (most recent call last): File "scripts/bloom-inference-server/server.py", line 83, in model = DSInferenceGRPCServer(args) File "/net/llm-shared-nfs/nfs/mayank/BigScience-Megatron-DeepSpeed/scripts/bloom-inference-server/ds_inference/grpc_server.py", line 36,...

Hello, When running the following code I get the FileNotFoundError Error. Any idea why this happens? I follow the usual install through conda (pytorch+cuda) and pip install . ``` mii_configs...

As subject, If I have to deploy one model into more than 1 machines, any kind of configuration could I make?

enhancement

AML deployments the model dir is not writeable, download config/tokenizer to a writeable cache path.

Provide local AML deployment option, this will use the [AML inference server](https://pypi.org/project/azureml-inference-server-http/) for the front end. We can then easily deploy an MII generated score file via: `azmlinfsrv --model_dir --entry_script...

enhancement

I'm running into a CUDA OOM error when loading this model due to the large size and lack of support for multi-GPU in HF pipeline.

Allow the users to pass a dictionary or [transformers.PretrainedConfig](https://huggingface.co/docs/transformers/v4.19.2/en/main_classes/configuration#transformers.PretrainedConfig) when deploying models.

enhancement

After #25 is complete we want to expose all DS-inference configs (https://deepspeed.readthedocs.io/en/latest/inference-init.html#deepspeed.init_inference) and ZeRO inference configs in the MII config dictionary.

enhancement