DeepSpeedExamples issues

Add LoRA optimization to the SD training example

This PR integrates LoRA optimization into the Stable Diffusion training example, building upon the already implemented distillation benefits. By applying LoRA-enhanced distillation, we achieve further improvements, including reduced inference time,...

PareesaMS

Replace deprecated transformers.deepspeed module

Fix the following warning: ```bash venv/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( ```

HollowMan6

[Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)

3

## Version deepspeed: `0.13.4` transformers: `4.38.1` Python: `3.10` Pytorch: `2.1.2+cu121` CUDA: 12.1 ## Error in Example (To reproduce) Just simply run this script https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py ```bash deepspeed --num_gpus 8 inference-test.py --model...

allanj

zero3 and enable hybrid engine are not suitable for llama2, how to solve it?

3

In my experiment, I found that if I use zero3 and enable hybrid engine setting, the Actor will generate repeat token or nothing during stage 3 (PPO) training. Here is...

terence1023

Codellama finetune

Does dschat example support codellama model fine tuning ?

nani1149

Much more memory used in step 3 when using multi gpus compared to using single gpu

5

**System Info:** Memory: 500G GPU: 8 * A100 80G Question: **Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?** **Reproduce:** Copy...

cokuehuang

deespeed chat

system

llama

Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

The mii inferencing benchmark script computes throughput as [num_clients/latency](https://github.com/microsoft/DeepSpeedExamples/blob/master/benchmarks/inference/mii/src/postprocess_results.py#L73). Shouldn't this be `num_queries/latency`? Also why use P95 latency and not the total time it took to process all the requests,...

goelayu

Add DPO support for DeepSpeed-Chat

1

Considering the advantages of [DPO(Direct Preference Optimization)](https://arxiv.org/abs/2305.18290) as being "stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing...

stceum

The inaccurate flop results after several rounds

1

Hi I tried to use the method "get_model_profile" to get the latency and flop for my model. To get avoid of the influence from randomness, I used this method in...

BitCalSaul

remove redundant code

https://github.com/microsoft/DeepSpeedExamples/blob/6c31d8ddee9e57f6202aeb4ee3c86f2fbd93d4c6/applications/DeepSpeed-Chat/dschat/utils/data/data_utils.py#L210 and L211, L214, L215

ilml

DeepSpeedExamples
DeepSpeedExamples copied to clipboard

Metadata

Add LoRA optimization to the SD training example

Replace deprecated transformers.deepspeed module

[Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)

zero3 and enable hybrid engine are not suitable for llama2, how to solve it?

Codellama finetune

Much more memory used in step 3 when using multi gpus compared to using single gpu

Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

Add DPO support for DeepSpeed-Chat

The inaccurate flop results after several rounds

remove redundant code

← Metadata

Owner

Metadata

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeedExamples
DeepSpeedExamples copied to clipboard