TimeMixer
TimeMixer copied to clipboard
GPU Memory Optimization Needed
memory usage is too high!! when training larger models it breaks 💥
problem
- big batch sizes lead to OOM errors on standard GPUs
- no gradient accumulation support
- no memory-efficient attention used
fix suggestion
add gradient accumulation steps parameter to reduce memory:
# pseudo-code
args.gradient_accumulation = 4 # accumulate over 4 steps
for i, (batch_x, ...) in enumerate(data_loader):
# Forward pass
outputs = model(batch_x, ...)
loss = criterion(outputs, targets) / args.gradient_accumulation
# Backward pass
loss.backward()
if (i + 1) % args.gradient_accumulation == 0:
optimizer.step()
optimizer.zero_grad()
would enable training bigger models and longer sequences without needing bigger GPU 🚀~~~