sys_reading
sys_reading copied to clipboard
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
https://arxiv.org/pdf/2304.11277.pdf