yuyu-san issues

Repositories
Issues
Comments

Results 3 issues of


                                            yuyu-san

OPT in TP or PP mode

Is there a way to inference OPT models in TensorParallel or PipelineParallel mode? As I understand: * BLOOM uses [llm provider](https://github.com/microsoft/DeepSpeed-MII/blob/main/mii/models/providers/llm.py) which loads the model weights as meta tensors first...

INT8 support

## Describe a requested feature I wonder if there's any plan to support 8bit inference in parallelformers. Right now, we can load 🤗 transformers models in 8bit like [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#running-mixedint8-models-multi-gpu-setup), e.g.:...

enhancement

Skill Selector

Hi Deepy Team! Thanks for open-sourcing this demo agent, it's great to see high-quality implementations made available to the public. One question: In the Deepy architecture you mentioned a `built-in...