Andrew Gu

Results 159 comments of Andrew Gu

@qsh-zh Thanks for your interest in FSDP2! Your concerns make a lot of sense to me. **API Stability** IIUC, PyTorch has a feature classification of prototype -> beta -> stable....

I updated the issue tracker this morning after seeing your comment. There are still a few things that may need a bit more validation, but the main items are all...

Yes, FSDP2 should address this. The memory usage is deterministic.

I am out for a week so cannot give a detailed response, but you can look at the recordStream part of the RFC linked at the bottom of this original...

might be nice to fold this as part of the `train_context`

cc: @tianyu-l on thoughts on how to handle this perhaps separate forward and backward contexts

I think Rohan added a tentative mixed precision API for DDP, but it never made it to public feature. I think using AMP is probably the way to go.

@tianyu-l I think it's also acceptable for now to allow the `norm` to be assigned to the root module. In other words, just wrap `tok_embeddings` separately and `output` separately.

cc: @weifengpy @mori360

@mingdianliu which version of PyTorch are you using? maybe you need a newer version