Andrew Gu

Results 159 comments of Andrew Gu

I would really appreciate some pointers to the complicated initialization to learn more about it. And yes, I think that the seed checkpoint can be used to avoid the meta...

cc: @wanchaol @tianyu-l The above two pointers are good examples of real-model init methods that do not fit our current meta-device init flow. As far as I can tell, both...

@qiziAI Thanks for the PR! Could you provide some more details of the conflict for our understanding?

There was some past discussion on this (https://github.com/pytorch/torchtitan/pull/280).

Failure are all inductor-related, not FSDP2-related.