Olatunji Ruwase

Results 627 comments of Olatunji Ruwase

> Will [FastPersist](https://arxiv.org/abs/2406.13768) be open-sourced in the next DeepSpeed release ? @cailun01, yes, we plan to open-source soon.

@Irene-123, you are welcome to give it a try. But I suspect this requires non-trivial effort and probably not a good first issue. @zaptrem, are you able to provide guidance...

Closing due to lack of activity. Please re-open as needed.

@Mars2018, @slchenchn, @gray311 can you please provide repro steps? Thanks!

@bill4689, do you know which code is generating those outputs? I don't believe it is DeepSpeed because DeepSpeed is unaware of epochs. Can you please try to locate the source...

Closing for lack of response. Please feel to re-open as needed.

@xiyang-aads-lilly, this is a good catch. Your proposed solution looks reasonable. Are you able to provide a PR? Thanks!

@eonsparks, thanks for suggesting solutions!

@WhaleSpring, can you clarify two things to help debugging. 1. Are checkpoints saved onto local disk? The load logs contain `/tmp/` references which is typically a node local storage. 2....