Olatunji Ruwase
Olatunji Ruwase
> Will [FastPersist](https://arxiv.org/abs/2406.13768) be open-sourced in the next DeepSpeed release ? @cailun01, yes, we plan to open-source soon.
@Irene-123, you are welcome to give it a try. But I suspect this requires non-trivial effort and probably not a good first issue. @zaptrem, are you able to provide guidance...
Closing due to lack of activity. Please re-open as needed.
@Mars2018, @slchenchn, @gray311 can you please provide repro steps? Thanks!
@bill4689, do you know which code is generating those outputs? I don't believe it is DeepSpeed because DeepSpeed is unaware of epochs. Can you please try to locate the source...
Closing for lack of response. Please feel to re-open as needed.
@delock, can you help with this?
@xiyang-aads-lilly, this is a good catch. Your proposed solution looks reasonable. Are you able to provide a PR? Thanks!
@eonsparks, thanks for suggesting solutions!
@WhaleSpring, can you clarify two things to help debugging. 1. Are checkpoints saved onto local disk? The load logs contain `/tmp/` references which is typically a node local storage. 2....