sreetamasarkar comments

Repositories
Issues
Comments

Results 3 comments of


                                            sreetamasarkar

CUDA memory increases after each loss.backward()

Yes, the memory values I reported are measured using torch.cuda.memory_allocated().

CUDA memory increases after each loss.backward()

I was using a slightly modified version inspired from [FMoETransformerMLP](https://github.com/VITA-Group/M3ViT/blob/d448b6fcfba70a661c9f3c42d3d72dba92c5f1e6/models/custom_moe_layer.py#L66). I observed that when I use NaiveGate, I do not have the memory issue. I suspect the memory increase might...

CUDA memory increases after each loss.backward()

I was having the memory issue with a customized gate.