shuttle fix: handle docker container startup race condition in local provisioner

Replaced panic with proper error handling when creating exec commands in containers that are still starting up. The previous implementation would panic with a 409 error when the container was not yet running, causing shuttle run to crash.

This change adds retry logic with container state inspection to gracefully wait for containers to become ready, distinguishing between transient startup delays and actual failures.

Changes made

Added retry mechanism with 60 attempts over 30 seconds for container readiness checks
Replaced .expect() calls with proper Result error handling using bail!() and .context()
Added container state inspection to differentiate between containers that are starting up versus containers that have failed
Added detailed error messages with container status and exit codes for debugging

Closes #2133

How has this been tested?

Tested locally with shuttle run using a project with DatabaseSharedPostgres resource. The container now starts successfully without panicking, properly waiting for the container to be ready before attempting to execute commands.

Nov 22 '25 18:11 7ttp

Greptile Overview

Greptile Summary

This PR fixes a race condition where shuttle run would panic when trying to execute commands in Docker containers that were still starting up. The fix replaces .expect() panics with proper error handling and adds retry logic with container state inspection.

Key changes:

Added retry mechanism (60 attempts over 30 seconds) for container readiness checks
Replaced panic-inducing .expect() calls with Result-based error handling using bail!() and .context()
Added container state inspection to differentiate between starting containers and failed containers
Added detailed error messages with container status and exit codes for debugging

Suggestions:

Consider adding function-level documentation to explain the complex retry and race condition handling logic
The conditional logic at lines 271-281 could be simplified as both the "running" and "else" branches perform identical operations

Confidence Score: 4/5

This PR is safe to merge with minimal risk - it properly handles a critical race condition that was causing panics
Score reflects that the core fix is solid and addresses the reported issue effectively. The retry logic with state inspection is a correct approach to handling Docker container startup timing. Minor deductions for: (1) lack of function documentation for complex logic as per project guidelines, and (2) some redundant conditional logic that could be simplified for better readability. These are style improvements rather than functional issues.
No files require special attention - the changes are well-contained and the logic is sound

Important Files Changed

File Analysis

Filename	Score	Overview
cargo-shuttle/src/provisioner_server.rs	4/5	Added retry logic with proper error handling to fix container startup race condition. Implementation is mostly solid but has minor redundant logic in state checking.

Nov 22 '25 18:11 greptile-apps[bot]

Hi, can anyone look at this PR so that the DB startup issue can be fixed?

Thank you.

Nov 28 '25 17:11 jymchng