LOMO icon indicating copy to clipboard operation
LOMO copied to clipboard

Gradient accumulation

Open EladDv opened this issue 1 year ago • 1 comments

Is gradient accumulation still mathematically possible in this scheme? Because if so we could train 65b on a single 3090 in a day and a half

EladDv avatar Jun 19 '23 17:06 EladDv

No, the gradient is no longer perseved in the GPU memory. If you offload the gradient tensor to the CPU memory or NVME, there is a large cost of transportation.

QipengGuo avatar Jun 19 '23 17:06 QipengGuo

Based on the information provided, we consider this issue resolved. If you have any further questions or concerns, please reopen this issue and provide additional details.

KaiLv69 avatar Jun 24 '23 16:06 KaiLv69