Qingyang Wu
Results
2
issues of
Qingyang Wu
https://github.com/microsoft/DialoGPT/blob/b85558dea5391f83b20120d6c93b9f79fcc72311/reddit_extractor/src/reddit.py#L108-L112
This line does not save optimizer state correctly when using FSDP. https://github.com/huggingface/transformers/blob/88399476c3892435395618ed37993176dbb0de73/src/transformers/trainer.py#L2383 It should use FSDP's full_optim_state_dict to collect optimizer states from different processes. ```python FSDP.full_optim_state_dict(self.model, self.optimizer) ```