runner-images icon indicating copy to clipboard operation
runner-images copied to clipboard

ubuntu-24.04 20250602.3.0 - ci tests hang indefinitely

Open saimedhi opened this issue 6 months ago • 4 comments

Description

  • ubuntu-24.04 20250602.3.0 - ci tests hang indefinitely
  • Similar issue as https://github.com/actions/runner-images/issues/12273

Platforms affected

  • [ ] Azure DevOps
  • [x] GitHub Actions - Standard Runners
  • [ ] GitHub Actions - Larger Runners

Runner images affected

  • [ ] Ubuntu 22.04
  • [x] Ubuntu 24.04
  • [ ] macOS 13
  • [ ] macOS 13 Arm64
  • [ ] macOS 14
  • [ ] macOS 14 Arm64
  • [ ] macOS 15
  • [ ] macOS 15 Arm64
  • [ ] Windows Server 2019
  • [ ] Windows Server 2022
  • [ ] Windows Server 2025

Image version and build link

Image: ubuntu-24.04 Version: 20250602.3.0 Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250602.3/images/ubuntu/Ubuntu2404-Readme.md Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu24%2F20250602.3

Is it regression?

Expected behavior

ci tests to complete. Not hang.

Actual behavior

ci tests hang indefinitely

Repro steps

  • Issue observed in opensearch-py repo

Example:

  • https://github.com/opensearch-project/opensearch-py/actions/runs/15404057320/job/43488379132?pr=912

saimedhi avatar Jun 04 '25 18:06 saimedhi

@vidyasagarnimmagaddi can you please take a look at this issue.

saimedhi avatar Jun 04 '25 18:06 saimedhi

Hi @saimedhi - Thanks for bringing this issue to our notice. We are looking into it. Thanks.

subir0071 avatar Jun 04 '25 20:06 subir0071

Hi @saimedhi - It is impossibly difficult for us to run through workflow logs of individual repositories. Please help us with a simple workflow that pin points the issue. Thanks for your understanding.

subir0071 avatar Jun 05 '25 15:06 subir0071

@subir0071, Thank you for looking into this issue.

  • This Github workflow is getting stuck when secured is true

https://github.com/opensearch-project/opensearch-py/blob/main/.github/workflows/integration.yml#L22

Here is a example PR where we are seeing this issue: https://github.com/opensearch-project/opensearch-py/pull/912

  • All the integ tests are getting stuck where security = true, example : https://github.com/opensearch-project/opensearch-py/actions/runs/15404057320/job/43488379121?pr=912

saimedhi avatar Jun 05 '25 21:06 saimedhi

@vidyasagarnimmagaddi, @subir0071 please let me know if you need any more details.

saimedhi avatar Jun 06 '25 14:06 saimedhi

@vidyasagarnimmagaddi, @subir0071 please take a look at this issue

saimedhi avatar Jun 09 '25 15:06 saimedhi

Hi @saimedhi - For me it is not getting stuck and the job is moving to completion https://github.com/subir0071/opensearch-py/actions/runs/15540770790/job/43750758173

However, we are looking some other parameters of this anomaly. Will keep you posted. Thanks.

subir0071 avatar Jun 09 '25 18:06 subir0071

ISSUE NOT YET FIXED.

@subir0071 , Previous issue for integ tests with security getting hang is fixed. But few other tests previously passing are now getting hang. Issue is occuring when unit tests are running on windows. please refer same PR . https://github.com/opensearch-project/opensearch-py/pull/912

saimedhi avatar Jun 09 '25 23:06 saimedhi

@subir0071 Any update on this issue?

saimedhi avatar Jun 10 '25 15:06 saimedhi

All the jobs seems to be passing in the shared workslow - https://github.com/opensearch-project/opensearch-py/actions/runs/15404057309/job/43768612788?pr=912

Please share the latest workflow (simpler, if possible) for us to analyze as we are struggling to repro the issue at our end.

subir0071 avatar Jun 10 '25 20:06 subir0071

@subir0071

I will rerun all tests and confirm if the issue is fixed. Thank you for the response

saimedhi avatar Jun 10 '25 20:06 saimedhi

@subir0071 , still seeing the issue.

Example PR: https://github.com/opensearch-project/opensearch-py/pull/912

Windows CI workflows are stuck https://github.com/opensearch-project/opensearch-py/actions/runs/15404057309/job/43843653096?pr=912

saimedhi avatar Jun 10 '25 21:06 saimedhi

I still do not see any stuck or failed job in the shared workflow link. Here is the screenshot of that shared page -

Image

Please, let me know if I am still missing something.

subir0071 avatar Jun 10 '25 22:06 subir0071

Here is the difference:

  • Windows tests are initially getting stuck and unusually tests are completing run after 1 hr 3 mins...
  • Where as tests on other platforms taking 1 min 36 sec.
  • nothing has changed in opensearch-py recently that can cause this.
Image Image

saimedhi avatar Jun 11 '25 15:06 saimedhi

HI , @saimedhi
Could you please try to build the jobs once again, thanks

vidyasagarnimmagaddi avatar Jun 12 '25 17:06 vidyasagarnimmagaddi

Recently, Redis CI has also been getting stuck frequently, especially in a slow environment. When I tried to roll back to the version from one year ago, it also got stuck. So it wasn't caused by the recent modifications. All these tests share a common case: they send a large number of commands. i tried to write some logs and found that they always get stuck in the middle of sending commands, but Redis itself has no issues. https://github.com/sundb/redis/actions/runs/15722400756/job/44305647413 https://github.com/redis/redis/actions/runs/15720680684/job/44300809458 https://github.com/redis/redis/actions/runs/15668828605/job/44136574780

When I specified the ubuntu version as 22.04, all these problems disappeared.

sundb avatar Jun 18 '25 03:06 sundb

Recently, Redis CI has also been getting stuck frequently, especially in a slow environment. When I tried to roll back to the version from one year ago, it also got stuck. So it wasn't caused by the recent modifications. All these tests share a common case: they send a large number of commands. i tried to write some logs and found that they always get stuck in the middle of sending commands, but Redis itself has no issues. https://github.com/sundb/redis/actions/runs/15722400756/job/44305647413 https://github.com/redis/redis/actions/runs/15720680684/job/44300809458 https://github.com/redis/redis/actions/runs/15668828605/job/44136574780

When I specified the ubuntu version as 22.04, all these problems disappeared.

Hi @sundb , Kindly raise a separate issue in public repo.

vidyasagarnimmagaddi avatar Jun 18 '25 05:06 vidyasagarnimmagaddi

Hi @saimedhi , As we have'nt heard from you. We are closing this issue .thanks

vidyasagarnimmagaddi avatar Jun 18 '25 05:06 vidyasagarnimmagaddi

Will reopen if I notice issue again. Thank you.

saimedhi avatar Jun 18 '25 05:06 saimedhi