ZeyuTeng96

Results 28 comments of ZeyuTeng96

> > > > 如果把logging_steps改为10以上呢? 请问,如果按照官方的bloom config和deepspeed config运行的话,会出现lr = 0的问题嘛?

> > igscience/bloomz-7b1-mt", "data_path": "data/res/merge_data.json", "output_dir": "trained_models/bloom", "per_device_train_batch_size": 1, "num_epochs": 2, "learning_rate": 1e-5, "cutoff_len": 1024, "val_set_size": 1000, "val_set_rate": 0.1, "save_steps": 1000, "eval_steps": 1000, "logging_steps": 1, "gradient_accumulation_steps": 32 } > >...

> 我们会找时间尝试一下,看看能不能复现这个问题。 > […](#) > ------------------ 原始邮件 ------------------ 发件人: "LianjiaTech/BELLE" ***@***.***>; 发送时间: 2023年4月9日(星期天) 晚上7:08 ***@***.***>; ***@***.******@***.***>; 主题: Re: [LianjiaTech/BELLE] 出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 (Issue #134) igscience/bloomz-7b1-mt",...

您好,做了如下的实现,其中bloom config为: { "model_type": "bloom", "model_name_or_path": "bigscience/bloom-1b1", "data_path": "data/trans_1.json", "output_dir": "trained_models/bloom", "per_device_train_batch_size": 1, "num_epochs": 2, "learning_rate": 1e-5, "cutoff_len": 1024, "val_set_size": 1000, "val_set_rate": 0.1, "save_steps": 1000, "eval_steps": 1000, "logging_steps": 1, "gradient_accumulation_steps":...

在单纯使用1b1模型,不使用deepspeed进行微调时,学习率变化如下: {'loss': 2.6999, 'learning_rate': 5.263157894736843e-07, 'epoch': 0.01} {'loss': 2.7946, 'learning_rate': 1.0526315789473685e-06, 'epoch': 0.02} {'loss': 3.1472, 'learning_rate': 1.5789473684210526e-06, 'epoch': 0.03} {'loss': 2.7722, 'learning_rate': 2.105263157894737e-06, 'epoch': 0.04} {'loss': 2.9574, 'learning_rate': 2.631578947368421e-06, 'epoch':...

在使用deepspeed config 1 的配置时,学习率变化如下: {'loss': 2.8091, 'learning_rate': 5.263157894736843e-07, 'epoch': 0.01} {'loss': 2.8488, 'learning_rate': 1.0526315789473685e-06, 'epoch': 0.02} {'loss': 2.9292, 'learning_rate': 1.5789473684210526e-06, 'epoch': 0.03} {'loss': 2.8395, 'learning_rate': 2.105263157894737e-06, 'epoch': 0.04} {'loss': 3.1188,...

在使用deepspeed config 2 的配置时,学习率变化如下: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 {'loss': 2.7112, 'learning_rate': 0, 'epoch': 0.01} 1%|▉ | 2/182 [01:05

看了一下trainer的default优化器貌似是adamw,但是官方提供的deepspeed配置文件里的优化器type为adam。其次,deepspeed配置文件里如果加入fp16和lr scheduler的话,就会存在前几个step学习率为0的情况。 @xianghuisun

还请问您们一下,如果使用bloom做指令微调的话,是需要对bloom-7b1的模型的词表进行扩充嘛? @xianghuisun