Dagger.jl icon indicating copy to clipboard operation
Dagger.jl copied to clipboard

Implement chaos testing framework

Open jpsamaroo opened this issue 4 years ago • 0 comments

As the scheduler grows more optimizations, options, and supported features, the combination of configurations that the scheduler needs to handle correctly grows exponentially. We could do ourselves a great service by doing automated testing of random configurations as part of CI. We could also do fault injection if this works out well.

Parameters that would vary:

  • DAG size and shape
  • Anonymous and named function thunks
  • Thunk/Scheduler options
  • Processor types (ThreadProc for now)
  • Checkpointing
  • Dynamic DAG extension and querying
  • SIGINT handling

Some metrics we'd want to test:

  • Correctness
  • Total runtime length
  • Per-process memory usage
  • Caching statistics
  • Network transfer statistics
  • Visualization output

jpsamaroo avatar Apr 20 '21 21:04 jpsamaroo