Reza Yazdani

Results 21 issues of Reza Yazdani

This PR adds the inference-kernel support for the CodeGen models, such as Salesforce/codegen-xx-multi You can now run inference on `Salesforce/codegen-16b-multi` model on one A100-40G without OOMing. Here is how I...

Tested with GPT-J, GPT-Neo and GPT-Neox.

This PR adds the Policy, Containers and some kernels for running the FALCON-40B model with tensor-model parallelism. ### FALCON-40B Architecture Overview FALCON model is an interesting model with an inference-friendly...

This PR adds the support for running Falcon-40B on multiple A100 GPUs. To run the test for this model you can use this [PR ](https://github.com/microsoft/DeepSpeedExamples/pull/557)on DeepSpeedExample repo.

This PR adds quantization support for 2, 4, and 8 bits which can be configured as a quantization config when injecting the inference kernels using `deepspeed.init_inference`. Here is an example...

Add some minor config changes to support int4 inference through DeepSpeed-Inference. The Int4 support will be added to DeepSpeed through this [PR](https://github.com/microsoft/DeepSpeed/pull/2526). cc: @stas00