storage icon indicating copy to clipboard operation
storage copied to clipboard

2.0 Release task tracker

Open johnugeorge opened this issue 7 months ago • 0 comments

To Do:

  • [ ] Add Rules Checker items for checkpointing
  • [ ] Verify Memory Capacity and that data written is bigger than memory or separate read and writes with call to clear caches in between and command to execute the writes gets printed
  • [ ] Check if data exists in checkpoint directory and raise & exit if so (require empty location)
  • [ ] Update rules document regarding above
  • [ ] Add info to rules that DLIO is the requirement and mlpstorage is the convenience methodology
  • [ ] Readme updates for checkpointing commands(like the training workloads)

Testing needed:

  • [x] Verify subset checkpointing. mode=subset is automatically set when number of processes is less than required process count

johnugeorge avatar Jun 02 '25 20:06 johnugeorge