DeepSpeedExamples issues

Results 274 DeepSpeedExamples issues

Sort by recently updated

Why does the chat.py script model answer normally, but answer repeatedly when step3

I deployed the model using the chat.py script and the model answered normally, but the output of the actor model was repeated throughout the step3. chat.py: ![捕获](https://github.com/microsoft/DeepSpeedExamples/assets/51126181/5ce07d8a-1b8c-4887-a13b-ca52a6165c95) step3: ![捕获2](https://github.com/microsoft/DeepSpeedExamples/assets/51126181/a41fd2c6-5cef-484e-ae90-80f6879d45ef) actor...

Luoxiaohei41

run step3 66b opt report an error out of memory

When I run step three I get an out of memory error. the program error location is 1072 line in /usr/local/lib/python3.10/dist-packages/deepspeed-0.10.1+cbf2f61a-py3.10.egg/deepspeed/runtime/engine.py （if not self.dont_change_device: self.module.to(self.device)） The program completely copied the...

lljjgg

fix: the source part should not participate in loss calculation in SFT stage

fix https://github.com/microsoft/DeepSpeedExamples/issues/660 In the SFT stage, it's essential that the source part doesn't contribute to the loss calculation, only the completion part should be considered. To address this issue, I've...

xffxff

DeepSpeed-VisualChat Tensor shape mismatch

When I use the CLI interface I get some errors, is this a bug? Enter image pathes, seperate by space (only support one image per time for now) (type 'na'...

Linjiahua

Overflow in deepspeed-chat LoRA and BF16 mode

- Example: Deepspeed-chat - Model: Llama2-7b-hf - Mode: LoRA, lora_dim=128 - precision: FP16 - Output log as below: - Question: Does the log mean it's training correctly? I found the...

THULiusj

Errors in cifar training and compression

I am trying to run cifar but for the one in training folder I get this error and for the one in compression a different error Python=3.9.16 PyTorch=1.13.0 DeepSpeed=0.9.5 Cuda=11.7...

Samanthavsilva

Does the DeepSpeedVisualChat model have the capability to locate targets, such as generating coordinates for bounding box positions?

May I ask for some guidance or advice, please

Watebear

DeepSpeed-Chat Step-1 training error

Hi, I cannot do step-1 SFT training after the refractoring. pip install deepspeed>=0.9.0 I did this in the folder applications/DeepSpeed-Chat ``` git clone https://github.com/microsoft/DeepSpeedExamples.git cd DeepSpeedExamples/applications/DeepSpeed-Chat/ pip install -r requirements.txt...

yifan-bao

DeepSpeed-Chat Step-3 tensorboard loss figures with multiple training epochs

Hi, when training RLHF step-3, I set parameters related to epochs as: - ppo_epochs = 1 - num_train_epochs = 30 and I found that the numbers of lines in "actor_loss",...

GeekDream-x

The reward in step3 seems to be completely random without any noticeable increase.

I am testing the 1.3B training. Steps 1 and 2 have already passed, but there is no change in reward after completing step 3. I used LoRa to train for...

laoda513

deespeed chat

modeling

DeepSpeedExamples
DeepSpeedExamples copied to clipboard

Metadata

Why does the chat.py script model answer normally, but answer repeatedly when step3

run step3 66b opt report an error out of memory

fix: the source part should not participate in loss calculation in SFT stage

DeepSpeed-VisualChat Tensor shape mismatch

Overflow in deepspeed-chat LoRA and BF16 mode

Errors in cifar training and compression

Does the DeepSpeedVisualChat model have the capability to locate targets, such as generating coordinates for bounding box positions?

DeepSpeed-Chat Step-1 training error

DeepSpeed-Chat Step-3 tensorboard loss figures with multiple training epochs

The reward in step3 seems to be completely random without any noticeable increase.

← Metadata

Owner

Metadata

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeedExamples
DeepSpeedExamples copied to clipboard