DynMoE icon indicating copy to clipboard operation
DynMoE copied to clipboard

Quadratic memory usage increase when increasing batch size

Open LckyLke opened this issue 1 month ago • 0 comments

For different batch_sizes i observed a quadratic memory increase, ie:

256 -> CUDA out of memory. Tried to allocate 37.61 GiB
and 512 -> ... 150.42 GiB
and 1024 -> ... 601.08 GiB

I could not find this Issue being mentioned in the paper. I think this is caused by the big one-hot vectors being created in the top any-routing, ie https://github.com/LINs-lab/DynMoE/blob/49272326e794b24bbebe3a9c6df3c079b3dd887b/DeepSpeed-0.9.5/deepspeed/moe/sharded_moe.py#L476C1-L476C100

I was able to slightly improved this by not creating these one-hot vectors (~50% mem reduction) but it still grows quadratic. Is this Issue known? Have there been any improvements / configurations to mitigate this issue?

Thanks :)

LckyLke avatar Nov 12 '25 16:11 LckyLke