DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Example models using DeepSpeed
This PR integrates LoRA optimization into the Stable Diffusion training example, building upon the already implemented distillation benefits. By applying LoRA-enhanced distillation, we achieve further improvements, including reduced inference time,...
Fix the following warning: ```bash venv/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( ```
## Version deepspeed: `0.13.4` transformers: `4.38.1` Python: `3.10` Pytorch: `2.1.2+cu121` CUDA: 12.1 ## Error in Example (To reproduce) Just simply run this script https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py ```bash deepspeed --num_gpus 8 inference-test.py --model...
In my experiment, I found that if I use zero3 and enable hybrid engine setting, the Actor will generate repeat token or nothing during stage 3 (PPO) training. Here is...
Does dschat example support codellama model fine tuning ?
**System Info:** Memory: 500G GPU: 8 * A100 80G Question: **Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?** **Reproduce:** Copy...
The mii inferencing benchmark script computes throughput as [num_clients/latency](https://github.com/microsoft/DeepSpeedExamples/blob/master/benchmarks/inference/mii/src/postprocess_results.py#L73). Shouldn't this be `num_queries/latency`? Also why use P95 latency and not the total time it took to process all the requests,...
Considering the advantages of [DPO(Direct Preference Optimization)](https://arxiv.org/abs/2305.18290) as being "stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing...
Hi I tried to use the method "get_model_profile" to get the latency and flop for my model. To get avoid of the influence from randomness, I used this method in...
https://github.com/microsoft/DeepSpeedExamples/blob/6c31d8ddee9e57f6202aeb4ee3c86f2fbd93d4c6/applications/DeepSpeed-Chat/dschat/utils/data/data_utils.py#L210 and L211, L214, L215