Reza Yazdani issues

Results 21 issues of


Reza Yazdani

Increasing the token-length based on available memory for GPT models

Create GPT-Neo on CPU and set the correct device when running on gpus

This PR adds the inference-kernel support for the CodeGen models, such as Salesforce/codegen-xx-multi You can now run inference on `Salesforce/codegen-16b-multi` model on one A100-40G without OOMing. Here is how I...

Add HE support for the rest of model containers

Tested with GPT-J, GPT-Neo and GPT-Neox.

Inference support for encoder-decoder architecture

Add FALCON-40B Inference-Kernel Support

This PR adds the Policy, Containers and some kernels for running the FALCON-40B model with tensor-model parallelism. ### FALCON-40B Architecture Overview FALCON model is an interesting model with an inference-friendly...

Add FALCON Auto-TP Support

This PR adds the support for running Falcon-40B on multiple A100 GPUs. To run the test for this model you can use this [PR ](https://github.com/microsoft/DeepSpeedExamples/pull/557)on DeepSpeedExample repo.

DS-Inference Quantization refresh: Fix several issues and add more features

This PR adds quantization support for 2, 4, and 8 bits which can be configured as a quantization config when injecting the inference kernels using `deepspeed.init_inference`. Here is an example...

Add configs to run int4 inference

Add some minor config changes to support int4 inference through DeepSpeed-Inference. The Int4 support will be added to DeepSpeed through this [PR](https://github.com/microsoft/DeepSpeed/pull/2526). cc: @stas00

Reza Yazdani