fixes: #6585 add Postgres healthcheck and make Docker workflow wait + retries
fixes:#6585 This PR stabilizes the Docker-based CI workflow by ensuring that Postgres is fully ready before Rails tests begin, and by adding lightweight retry logic around the test command. The root cause of the flakiness was that tests could start before Postgres had finished initializing, causing intermittent failures.
Wouldn't it be better to change the docker-compose.yml to use a condition: service_healthy for the app rather than the custom logic in .github/workflows/docker.yml?:
depends_on:
db:
condition: service_healthy
You're absolutely right! Using condition: service_healthy would be much cleaner. Plz let me know more as it is my first Pr .
So the database healthcheck thing doesn't look unreasonable but I can't actually find any failing cases that look like they are caused by the database being ready so I'd be interested to know what made you think it was an issue?
You've also extended the (recently introduced) timeout for the job from 20 minutes to 40 minutes - do you have some cases in mind where the current timeout was too short?
Finally you've added a couple of retry loops with no real explanation of why you think they're a good idea - if docker build fails once why would think you think running it a second time would help? That suggests that transient failures of some sort are common - do you have an example of that? Ideally the solution to such things is to try and fix the transient failures not to work around them with a retry loop.
I've extracted the db healthcheck into a separate PR, and tweaked some aspects of it - see #6626.
I don't believe the other retry loops are required, and I think the custom bash retry logic is far too complex for both the situation and the (un)likely benefits. It definitely won't fix #6585 since the workflows are failing at unrelated locations.
@manik3160 thanks for making this PR! We've used some of it, but we aren't going to use the rest, so I'm going to close this now.