metaseq icon indicating copy to clipboard operation
metaseq copied to clipboard

check model parallel ranks have consistent data

Open zdevito opened this issue 1 year ago • 0 comments

  • check parallel ranks have consistent data
  • remove a potential race condition when saving checkpoints that lets sequences_consumed get a head of number of iterations
  • Add code to fixup potentially broken data loaders on a restart

zdevito avatar Dec 24 '22 00:12 zdevito