Issues
Issues copied to clipboard
Minimize parallel health checks for cloud targets
The Need
Health checks for cloud targets are currently run daily for each Space. The job that schedules these health checks runs every 30 seconds. These interval values are not configurable, and can only be modified by a code change.
These health checks can overwhelm a system in certain circumstances:
- Large amount of Spaces
- Large amount of Cloud Targets in each Space
- Small amount of Workers
- Workers have restricted resources (disk access speed, CPU, etc)
Solution
Stagger queueing the health checks to disperse the load placed upon workers. e.g. rather than schedule a batch of 24 health checks once a day, schedule 1 health check every hour.