Results 3 issues of yuyu-san

Is there a way to inference OPT models in TensorParallel or PipelineParallel mode? As I understand: * BLOOM uses [llm provider](https://github.com/microsoft/DeepSpeed-MII/blob/main/mii/models/providers/llm.py) which loads the model weights as meta tensors first...

## Describe a requested feature I wonder if there's any plan to support 8bit inference in parallelformers. Right now, we can load 🤗 transformers models in 8bit like [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#running-mixedint8-models-multi-gpu-setup), e.g.:...

enhancement

Hi Deepy Team! Thanks for open-sourcing this demo agent, it's great to see high-quality implementations made available to the public. One question: In the Deepy architecture you mentioned a `built-in...