DeepSpeedExamples issues

Results 274 DeepSpeedExamples issues

Sort by recently updated

My model Performs Badly...Is GPU memory to small?

Hi! I trained the model just as you directed, but the model generation is very very bad.It can not even speak a complete sentence...And when I train step3, its reward...

Trace2333

bug

deespeed chat

New training: Alpaca-lora-zero3 on 2080Ti

We added a new example to fine-tune LLaMA on 2080Ti-level GPUs. In my environment, with 8 2080Ti GPUs, LLama-7b can be fine-tuned on alpaca-52k dataset at the speed of 1.5...

bigeagle

Error when running example of reward model training.

Hello, I'm running the example script of single node reward model training in this [link](https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node) and get error log like below: ```Traceback (most recent call last): File "/home/bingxing2/gpuuser183/bak/xydu/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py", line 348,...

Luoyang144

If I use a self-improved transformer architecture, can it support?

The customized model is not in your "Supported Models" list. Can it benefit from Deepspeed chat?

liujuncn

RuntimeError: Step 1 exited with non-zero status 1

After finishing install successfully, i got this error when ran this command: python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --num-gpus 1 ---=== Running Step 1 ===--- Traceback (most recent call last):...

yudonglee

SFT loss

https://github.com/microsoft/DeepSpeedExamples/blob/d570b2cc8a8fd4207c9424744669437d4c68ec43/applications/DeepSpeed-Chat/training/utils/data/data_utils.py#L122 ``` if self.train_phase == 1: return { "input_ids": self.chosen_dataset[idx]["input_ids"], "attention_mask": self.chosen_dataset[idx]["attention_mask"], "labels": self.chosen_dataset[idx]["input_ids"] } ``` In the SFT stage, input_ids and labels are the same, so the loss calculation...

ruidongtd

question

deespeed chat

Can the program support longer answer_seq and prompt_seq lengths？

I run the test program use "python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --num-gpus 8".The program can run normally.But I modified the parameter max_ answer_ seq_ len = 1024 and max_prompt_seq_len...

lljjgg

Can I train a opt-6.7B model on 4x4090 gpus?

I get 4 4090 gpus, and I want to train the opt-6.7B by using DeepSpeed Chat. Is that possible? I mean I have no idea if I should get a...

eggqq007

bug

deespeed chat

The step2 scoring looks correct but the step3 model is talking gibberish

For the step2 scoring: `python3 training/step2_reward_model_finetuning/rw_eval.py --model_name_or_path output/reward-models/350m/ ==================Eval result============================ prompt: Human: Please tell me about Microsoft in a few sentence? Assistant: good_ans: Microsoft is a software company that develops,...

panganqi

bug

deespeed chat

fix-step3-readme

fix readme in step3.

zhangfanTJU

DeepSpeedExamples
DeepSpeedExamples copied to clipboard

Metadata

My model Performs Badly...Is GPU memory to small?

New training: Alpaca-lora-zero3 on 2080Ti

Error when running example of reward model training.

If I use a self-improved transformer architecture, can it support?

RuntimeError: Step 1 exited with non-zero status 1

SFT loss

Can the program support longer answer_seq and prompt_seq lengths？

Can I train a opt-6.7B model on 4x4090 gpus?

The step2 scoring looks correct but the step3 model is talking gibberish

fix-step3-readme

← Metadata

Owner

Metadata

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeedExamples
DeepSpeedExamples copied to clipboard