flor icon indicating copy to clipboard operation
flor copied to clipboard

Fault Tolerance

Open rlnsanz opened this issue 5 years ago • 1 comments

  • Resume training on crashes.
  • Write valid logs when possible, even during failures (exception handling).
  • Reconstruct logs on sudden failures.

rlnsanz avatar Feb 22 '21 15:02 rlnsanz

The goal is to enable resuming a crashed execution, and to enable hindsight logging over a crashed execution and partial/invalid MEMO.json file.

rlnsanz avatar Feb 22 '21 15:02 rlnsanz