fix: handle docker container startup race condition in local provisioner
Replaced panic with proper error handling when creating exec commands in containers that are still starting up. The previous implementation would panic with a 409 error when the container was not yet running, causing shuttle run to crash.
This change adds retry logic with container state inspection to gracefully wait for containers to become ready, distinguishing between transient startup delays and actual failures.
Changes made
- Added retry mechanism with 60 attempts over 30 seconds for container readiness checks
- Replaced
.expect()calls with properResulterror handling usingbail!()and.context() - Added container state inspection to differentiate between containers that are starting up versus containers that have failed
- Added detailed error messages with container status and exit codes for debugging
Closes #2133
How has this been tested?
Tested locally with shuttle run using a project with DatabaseSharedPostgres resource. The container now starts successfully without panicking, properly waiting for the container to be ready before attempting to execute commands.
Greptile Overview
Greptile Summary
This PR fixes a race condition where shuttle run would panic when trying to execute commands in Docker containers that were still starting up. The fix replaces .expect() panics with proper error handling and adds retry logic with container state inspection.
Key changes:
- Added retry mechanism (60 attempts over 30 seconds) for container readiness checks
- Replaced panic-inducing
.expect()calls withResult-based error handling usingbail!()and.context() - Added container state inspection to differentiate between starting containers and failed containers
- Added detailed error messages with container status and exit codes for debugging
Suggestions:
- Consider adding function-level documentation to explain the complex retry and race condition handling logic
- The conditional logic at lines 271-281 could be simplified as both the "running" and "else" branches perform identical operations
Confidence Score: 4/5
- This PR is safe to merge with minimal risk - it properly handles a critical race condition that was causing panics
- Score reflects that the core fix is solid and addresses the reported issue effectively. The retry logic with state inspection is a correct approach to handling Docker container startup timing. Minor deductions for: (1) lack of function documentation for complex logic as per project guidelines, and (2) some redundant conditional logic that could be simplified for better readability. These are style improvements rather than functional issues.
- No files require special attention - the changes are well-contained and the logic is sound
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| cargo-shuttle/src/provisioner_server.rs | 4/5 | Added retry logic with proper error handling to fix container startup race condition. Implementation is mostly solid but has minor redundant logic in state checking. |
Hi, can anyone look at this PR so that the DB startup issue can be fixed?
Thank you.