GPU Memory Optimization Needed

Open dodufish opened this issue 9 months ago • 0 comments

memory usage is too high!! when training larger models it breaks 💥

problem

big batch sizes lead to OOM errors on standard GPUs
no gradient accumulation support
no memory-efficient attention used

fix suggestion

add gradient accumulation steps parameter to reduce memory:

# pseudo-code
args.gradient_accumulation = 4  # accumulate over 4 steps

for i, (batch_x, ...) in enumerate(data_loader):
    # Forward pass
    outputs = model(batch_x, ...)
    loss = criterion(outputs, targets) / args.gradient_accumulation
    
    # Backward pass
    loss.backward()
    
    if (i + 1) % args.gradient_accumulation == 0:
        optimizer.step()
        optimizer.zero_grad()

would enable training bigger models and longer sequences without needing bigger GPU 🚀~~~

Apr 27 '25 02:04 dodufish