OLMo
OLMo copied to clipboard
Try PyTorch FSDP "HYBRID_SHARD" strategy
While this wasn't implemented as of PyTorch 1.13.1, it appears it's going to be in the next release because it's implemented in the master branch: https://github.com/pytorch/pytorch/blob/master/torch/distributed/fsdp/api.py#L31.