NostalgiaOfTime
NostalgiaOfTime
I see much shell file include "--actor_lora_module_name" but without "only_optimize_lora". According to the source code, this will cause all the model parameters will be trained instead of only LoRA layers....
**Describe the bug** 4张80G的A100好像不能支持基于lora的7b bloom在batch为4的条件下训练,Colossalai是可以的,比较困惑,我对比了一下,batch只能设置到1 **To Reproduce** 下面是我稍微修改适配bloom的脚本(官方只公开适配facebook的opt脚本) 官方指出gradient_checkpointing和only optimize lora是冲突的,因此我只用了only optimize lora OUTPUT_PATH=/mnt/bn/simple-nas/mlx/users/zhangyawei.ywsq/playground/arnold_ywsq/DeepSpeedExamples/applications/DeepSpeed-Chat/save/actor-models/7b1_bloom_lora mkdir -p $OUTPUT_PATH deepspeed --master_port 25104 --num_gpus 4 main.py --data_path xxx --data_split 10,0,0 --model_name_or_path xxx --per_device_train_batch_size...
#!/bin/bash # Copyright (c) Microsoft Corporation. # SPDX-License-Identifier: Apache-2.0 # DeepSpeed Team # Note that usually LoRA needs to use larger learning rate OUTPUT_PATH=/mnt/bn/simple-nas/mlx/users/zhangyawei.ywsq/playground/arnold_ywsq/DeepSpeedExamples/applications/DeepSpeed-Chat/save/actor-models/7b1_bloom_lora mkdir -p $OUTPUT_PATH deepspeed --master_port 25104...