Is requires_grad mandatory?

Open jomayeri opened this issue 2 years ago • 3 comments

Does every tensor used in TE need to have requires_grad = True ?

I needed to add a dummy tensor for compatibility purposes to get activation checkpointing to work for TE in Megatron-DeepSpeed PR here. I had to set requires_grad for TE to work, and am wondering if that is always the case.

Sep 22 '23 18:09 jomayeri

It should not be mandatory - could you share the error you are getting or some small repro of the problem?

Sep 22 '23 21:09 ptrendx

This is the stack trace:

Fails when this line from Megatron-DeepSpeed is changed to False https://github.com/microsoft/Megatron-DeepSpeed/blob/4822c87ee6adfa4e480614cbe3f1d8ae00bd3db7/megatron/model/transformer.py#L1754C1-L1754C107

Oct 05 '23 17:10 jomayeri

@timmoon10 Could you take a look at that?

Oct 12 '23 19:10 ptrendx