nanoGPT
nanoGPT copied to clipboard
Replacing DDP with FSDP since it has sharing capability train.py
I just imported the libs necessary to replace DDP with FSDP, I think that sending the model to GPU isn’t as efficient as sharding, so lets replace DDP, if I get positive response I will continue working on it, the PR is now at 5% completion