NostalgiaOfTime

Results 5 comments of NostalgiaOfTime

@ScottishFold007 我看源码好像默认是使用float16,且gradient_checkpointing和only_optimize_lora不能同时使用,官方开源的代码想要LoRA就必须得放弃gradient_checkpointing。 按道理说不应该站这么大显存的,毕竟用了LoRA,每层只有两个缩放矩阵需要BP,实际优化的参数量很小

the source code have write that "gradient_checkpointing" and "only_optimize_lora" cannot be used at the same time. so you can decrease the size of batch or delete only_optimize_lora

same,when I use Zero3 and the error is occur, and I run correctly in Zero1