CI gets stuck forever until times out
Description
Recently, Redis CI has also been getting stuck frequently, especially in a slow environment. When I tried to roll back to the version from one year ago, it also got stuck. So it wasn't caused by the recent modifications. All these tests share a common case: they send a large number of commands. i tried to write some logs and found that they always get stuck in the middle of sending commands, but Redis itself has no issues. sundb/redis/actions/runs/15722400756/job/44305647413 redis/redis/actions/runs/15720680684/job/44300809458 redis/redis/actions/runs/15668828605/job/44136574780
When I specified the ubuntu version as 22.04, all these problems disappeared.
Platforms affected
- [ ] Azure DevOps
- [x] GitHub Actions - Standard Runners
- [ ] GitHub Actions - Larger Runners
Runner images affected
- [ ] Ubuntu 22.04
- [x] Ubuntu 24.04
- [ ] macOS 13
- [ ] macOS 13 Arm64
- [ ] macOS 14
- [ ] macOS 14 Arm64
- [ ] macOS 15
- [ ] macOS 15 Arm64
- [ ] Windows Server 2019
- [ ] Windows Server 2022
- [ ] Windows Server 2025
Image version and build link
Image: ubuntu-24.04 Version: 20250609.1.0 Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250609.1/images/ubuntu/Ubuntu2404-Readme.md Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu24%2F20250609.1
Is it regression?
no
Expected behavior
ci tests to complete. Not hang.
Actual behavior
CI gets stuck forever until times out
Repro steps
no
Hi @sundb, Thank you for bringing this issue to our attention. We will look into this issue and will update you after investigating.
@hemanthmanga thanks.
Hi @sundb , Could you please re-run your workflow and share me the logs
@akilesh-amaran sure, I'll run one.
hi @akilesh-amaran This is the CI I just ran. I’ve been investigating the cause of this issue over the past couple of days. I found that the problem gets stuck at Tcl’s write() and flush(). In the failing tests, we write 500,000 commands, but around 300,000, it starts to get stuck. Initially, the delay is about 6 seconds, and as we continue writing more commands, the delay becomes increasingly longer. The stuck happens in a very consistent pattern. Thanks for your help.
Hi @sundb , Could you please update the sync and try with the latest version .
@akilesh-amaran please take a look: https://github.com/redis/redis/actions/runs/16070509266 still stuck, thx.
Could you please update the sync and try with the latest version .
We use ubuntu-latest by default, so is it always the latest?
Thank you for confirming that ubuntu-latest was used — as expected, it defaults to the latest available version. We truly appreciate your follow-up and will make sure to keep you informed with any further updates
Hi @sundb , Thanks so much for sharing the workflow. We had a look, but since it includes quite a few custom configurations, it's a bit tricky for us to pinpoint what's causing the issue. If possible, could you help us out with a simplified version of the workflow that still reproduces the problem? A minimal example would really help us investigate more effectively and get to the bottom of it faster.
@akilesh-amaran thanks for your help, i'll try it.
@akilesh-amaran can you take a look at this comment: https://github.com/redis/redis/issues/14196#issuecomment-3082997931 ? thx.