LOMO issues

How to calculate the used GPU memory for each part as in the paper?

2

Hi @QipengGuo @KaiLv69 @ayyyq Thanks for the nice work, I am wondering how to calculate the detailed used GPU memory as illustrated in the paper, such as the results in...

liming-ai

Is LOMO capable of pre-training a LLM from scratch as well?

2

YuxingLu613

about torch.stack(self.grad_norms)

3

这里运行的时候报了 RuntimeError: stack expects a non-empty TensorList 的错误，看了下代码，确实是空的，这个地方要怎么解决呢？

jinzitian

Memory consumption first grows up then falls down.

3

Dear authors, it is nice to see this amazing work. When I run this code, I found an interesting phenomenon that when loading the model, it occupies more GPU memory....

zhenqin96

Some confusion about the method of the paper

3

大佬您好，传统梯度反向链式传播会用到上一步的梯度计算结果，但文中的方法在更新后不存储梯度，是否意味着后续梯度计算中多了重复的计算，类似时间换空间的做法。这么理解正确吗？

JorunoJobana

LOMO
LOMO copied to clipboard

Metadata

How to calculate the used GPU memory for each part as in the paper?

Is LOMO capable of pre-training a LLM from scratch as well?

about torch.stack(self.grad_norms)

Memory consumption first grows up then falls down.

Some confusion about the method of the paper

← Metadata

Owner

Metadata

LOMO LOMO copied to clipboard

Metadata

How to calculate the used GPU memory for each part as in the paper?

Is LOMO capable of pre-training a LLM from scratch as well?

about torch.stack(self.grad_norms)

Memory consumption first grows up then falls down.

Some confusion about the method of the paper

← Metadata

Owner

Metadata

LOMO
LOMO copied to clipboard