DeepSpeedExamples issues

The loss in reward_model.py

2

https://github.com/microsoft/DeepSpeedExamples/blob/bae2667824974ac13dac28712462c14a2e594150/applications/DeepSpeed-Chat/training/utils/model/reward_model.py#L103 What if we change the loss to torch.log(torch.sigmoid(c_truncated_reward.mean() - r_truncated_reward.mean())) Instead of torch.log(torch.sigmoid(c_truncated_reward - r_truncated_reward)).mean()? I think in InstructGPT paper, it should be the latter?

haochenglouis

deespeed chat

how to train a unsupervised dataset？

3

janelu9

deespeed chat

Update main.py

RobinSeaside

change load local ./ hf parquet dataset

6

add load downloaded huggingface dataset, in case of machines cannot connect to huggingface.co

kindaQ

Update README.md

fix spelling errors

digger-yu

add pleisto/yuren dataset

Yuren(羽人) is a multi-language instructional dataset primarily in Chinese and English, suitable for DeepspeedChat. https://huggingface.co/datasets/pleisto/yuren

0xDing

4张80G的A100好像不能支持基于lora的7b bloom在batch为4的条件下训练，Colossalai是可以的，比较困惑，我对比了一下，batch只能设置到1

2

#!/bin/bash # Copyright (c) Microsoft Corporation. # SPDX-License-Identifier: Apache-2.0 # DeepSpeed Team # Note that usually LoRA needs to use larger learning rate OUTPUT_PATH=/mnt/bn/simple-nas/mlx/users/zhangyawei.ywsq/playground/arnold_ywsq/DeepSpeedExamples/applications/DeepSpeed-Chat/save/actor-models/7b1_bloom_lora mkdir -p $OUTPUT_PATH deepspeed --master_port 25104...

NostalgiaOfTime

No more output from models at step1 and step3

1

Hi, I tried using your DeepSpeed-chat example to train one `facebook/opt-1.3B` model using RLHF. I'm using a custom dataset of 500 examples. I updated the `data_utils.py` and `raw_datasets.py` files to...

chainyo

deespeed chat

if ref_model is a copy of act_model at begining in stage3 , does it mean the kl_divergence is 0?

1

janelu9

deespeed chat

reproduce issue about step3 13B rlhf

I tried to reproduce the 13B rlhf training in A100-80GB * 8. I found the default training script here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/run_13b.sh where the per_device_train_batch_size and per_device_mini_train_batch_size is 16, which is different...

SeaOfOcean

deespeed chat

DeepSpeedExamples
DeepSpeedExamples copied to clipboard

Metadata

The loss in reward_model.py

how to train a unsupervised dataset？

Update main.py

change load local ./ hf parquet dataset

Update README.md

add pleisto/yuren dataset

4张80G的A100好像不能支持基于lora的7b bloom在batch为4的条件下训练，Colossalai是可以的，比较困惑，我对比了一下，batch只能设置到1

No more output from models at step1 and step3

if ref_model is a copy of act_model at begining in stage3 , does it mean the kl_divergence is 0?

reproduce issue about step3 13B rlhf

← Metadata

Owner

Metadata

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeedExamples
DeepSpeedExamples copied to clipboard