sreetamasarkar

Results 3 comments of sreetamasarkar

Yes, the memory values I reported are measured using torch.cuda.memory_allocated().

I was using a slightly modified version inspired from [FMoETransformerMLP](https://github.com/VITA-Group/M3ViT/blob/d448b6fcfba70a661c9f3c42d3d72dba92c5f1e6/models/custom_moe_layer.py#L66). I observed that when I use NaiveGate, I do not have the memory issue. I suspect the memory increase might...

I was having the memory issue with a customized gate.