NostalgiaOfTime
NostalgiaOfTime
Is any update, I still confuse about it
@ScottishFold007 我看源码好像默认是使用float16,且gradient_checkpointing和only_optimize_lora不能同时使用,官方开源的代码想要LoRA就必须得放弃gradient_checkpointing。 按道理说不应该站这么大显存的,毕竟用了LoRA,每层只有两个缩放矩阵需要BP,实际优化的参数量很小
the source code have write that "gradient_checkpointing" and "only_optimize_lora" cannot be used at the same time. so you can decrease the size of batch or delete only_optimize_lora
@yaozhewei 目前框架好像gradient checkpointing和only_optimize_lora不能同时使用
same,when I use Zero3 and the error is occur, and I run correctly in Zero1