sreetamasarkar
Results
3
comments of
sreetamasarkar
Yes, the memory values I reported are measured using torch.cuda.memory_allocated().
I was using a slightly modified version inspired from [FMoETransformerMLP](https://github.com/VITA-Group/M3ViT/blob/d448b6fcfba70a661c9f3c42d3d72dba92c5f1e6/models/custom_moe_layer.py#L66). I observed that when I use NaiveGate, I do not have the memory issue. I suspect the memory increase might...
I was having the memory issue with a customized gate.