DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Example models using DeepSpeed

Results 274 DeepSpeedExamples issues
Sort by recently updated
recently updated
newest added

I deployed the model using the chat.py script and the model answered normally, but the output of the actor model was repeated throughout the step3. chat.py: ![捕获](https://github.com/microsoft/DeepSpeedExamples/assets/51126181/5ce07d8a-1b8c-4887-a13b-ca52a6165c95) step3: ![捕获2](https://github.com/microsoft/DeepSpeedExamples/assets/51126181/a41fd2c6-5cef-484e-ae90-80f6879d45ef) actor...

When I run step three I get an out of memory error. the program error location is 1072 line in /usr/local/lib/python3.10/dist-packages/deepspeed-0.10.1+cbf2f61a-py3.10.egg/deepspeed/runtime/engine.py (if not self.dont_change_device: self.module.to(self.device)) The program completely copied the...

fix https://github.com/microsoft/DeepSpeedExamples/issues/660 In the SFT stage, it's essential that the source part doesn't contribute to the loss calculation, only the completion part should be considered. To address this issue, I've...

When I use the CLI interface I get some errors, is this a bug? Enter image pathes, seperate by space (only support one image per time for now) (type 'na'...

- Example: Deepspeed-chat - Model: Llama2-7b-hf - Mode: LoRA, lora_dim=128 - precision: FP16 - Output log as below: - Question: Does the log mean it's training correctly? I found the...

I am trying to run cifar but for the one in training folder I get this error and for the one in compression a different error Python=3.9.16 PyTorch=1.13.0 DeepSpeed=0.9.5 Cuda=11.7...

Hi, I cannot do step-1 SFT training after the refractoring. pip install deepspeed>=0.9.0 I did this in the folder applications/DeepSpeed-Chat ``` git clone https://github.com/microsoft/DeepSpeedExamples.git cd DeepSpeedExamples/applications/DeepSpeed-Chat/ pip install -r requirements.txt...

Hi, when training RLHF step-3, I set parameters related to epochs as: - ppo_epochs = 1 - num_train_epochs = 30 and I found that the numbers of lines in "actor_loss",...

I am testing the 1.3B training. Steps 1 and 2 have already passed, but there is no change in reward after completing step 3. I used LoRa to train for...

deespeed chat
modeling