LOMO
LOMO copied to clipboard
LOMO: LOw-Memory Optimization
Hi @QipengGuo @KaiLv69 @ayyyq Thanks for the nice work, I am wondering how to calculate the detailed used GPU memory as illustrated in the paper, such as the results in...
这里运行的时候报了 RuntimeError: stack expects a non-empty TensorList 的错误,看了下代码,确实是空的,这个地方要怎么解决呢?
Dear authors, it is nice to see this amazing work. When I run this code, I found an interesting phenomenon that when loading the model, it occupies more GPU memory....
大佬您好,传统梯度反向链式传播会用到上一步的梯度计算结果,但文中的方法在更新后不存储梯度,是否意味着后续梯度计算中多了重复的计算,类似时间换空间的做法。这么理解正确吗?