postgres
postgres copied to clipboard
test: error handle, state mgmt, backoff, timeouts
What kind of change does this PR introduce?
EC2 Test Resilience Improvements
Retry Wrapper Function
- Added
retry_with_backoffdecorator that implements exponential backoff - Configurable retry attempts, delays, and exception types
- Proper logging of retry attempts and failures
Error Handling and Logging
- Added comprehensive error handling throughout the code
- Improved logging with detailed messages and error context
- Added proper exception handling for AWS API calls
Instance State Management
- Added
wait_for_instance_runningfunction with retries - Added proper state validation before proceeding
- Added timeout for instance state transitions
Backoff Strategy
- Implemented exponential backoff in the retry decorator
- Configurable initial delay and maximum delay
- Proper sleep intervals between retries
Resource Validation
- Added
validate_aws_resourcesfunction to check security groups and IAM roles - Validates resources before instance creation
- Provides clear error messages for validation failures
Simplified Startup
- Broke down the instance creation process into smaller, focused functions
- Each function has a single responsibility
- Better error isolation and handling
AWS API Timeouts
- Added proper timeouts for SSH connections
- Added timeout for health checks
- Added timeout for instance state transitions
Robust Health Checks
- Improved health check system with proper error handling
- Added timeout for health checks
- Better logging of health check failures
- Separate function for checking individual services
Cleanup Code
- Added proper cleanup in
finallyblock - Ensures instance termination even on failures
- Logs cleanup failures
Detailed Logging
- Added comprehensive logging throughout
- Logs all major operations and state transitions
- Logs errors with proper context
- Helps diagnose failures
Will this help the sporadic timeouts we get from time to time on the testinfra CI job?
Will this help the sporadic timeouts we get from time to time on the testinfra CI job?
Steve, yes, I am trying to target that. I am going to wait on this until I finish https://github.com/supabase/postgres/pull/1547 as that will let me iterate on this locally