setup-python icon indicating copy to clipboard operation
setup-python copied to clipboard

Intermittent failures during Post Setup Python step for MacOS

Open andrewkho opened this issue 9 months ago • 1 comments

I'm new to Github Actions and I'm having trouble understanding this failure, apologies if this isn't the right way to flag the issue.

Description: Post Setup Python fails intermittently with macos-latest. On successful runs it's much slower to clean up / shut down than windows and linux.

Action version: Tested with Actions v3/v4 and setup-python v4/v5

Platform:

  • [ ] Ubuntu
  • [x] macOS
  • [ ] Windows

Runner type:

  • [x] Hosted
  • [ ] Self-hosted

Tools version: 3.8, 3.9, 3.10

Repro steps:

The original workflow yaml is here: https://github.com/pytorch/data/blob/main/.github/workflows/stateful_dataloader_ci.yml

In this failed run I tried updating actions from v3 -> v4 and setup-python from v4 -> v5, and it still exhibits the behaviour: Example of failed run: https://github.com/pytorch/data/actions/runs/8903946672/job/24452473208?pr=1249 Failed retry with debug logs: https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388

##[debug]Evaluating condition for step: 'Post Setup Python 3.9'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Post Setup Python [3](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:3).9
##[debug]Loading inputs
##[debug]Evaluating: matrix.python-version
##[debug]Evaluating Index:
##[debug]..Evaluating matrix:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'python-version'
##[debug]=> 3.[9](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:9)
##[debug]Result: 3.9
##[debug]Evaluating: (((github.server_url == 'https://github.com') && github.token) || '')
##[debug]Evaluating Or:
##[debug]..Evaluating And:
##[debug]....Evaluating Equal:
##[debug]......Evaluating Index:
##[debug]........Evaluating github:
##[debug]........=> Object
##[debug]........Evaluating String:
##[debug]........=> 'server_url'
##[debug]......=> 'https://github.com/'
##[debug]......Evaluating String:
##[debug]......=> 'https://github.com'
##[debug]....=> true
##[debug]....Evaluating Index:
##[debug]......Evaluating github:
##[debug]......=> Object
##[debug]......Evaluating String:
##[debug]......=> 'token'
##[debug]....=> '***'
##[debug]..=> '***'
##[debug]=> '***'
##[debug]Expanded: ((('https://github.com/' == 'https://github.com') && '***') || '')
##[debug]Result: '***'
##[debug]Loading env
Post job cleanup.
##[debug]Re-evaluate condition on job cancellation for step: 'Post Setup Python 3.9'.

Expected behavior: Expect Post Setup-Python to finish quickly and succeed.

Actual behavior: Post Setup-Python hangs and marks the run as failed.

andrewkho avatar May 01 '24 16:05 andrewkho

Hello @andrewkho Thank you for creating this issue. We will investigate it and get back to you as soon as we have some feedback.

HarithaVattikuti avatar May 01 '24 19:05 HarithaVattikuti

Hello @andrewkho, we have investigated the issue and we are not able to reproduce it with actions/setup-python@v3,v4,v5. Please find the screenshots for reference. We have noticed in the provided run in this issue that post checkout job isn't terminating as expected. It might be due to an external service not responding as expected, causing the job to hang. Moreover, the workflow provided does interact with a few external services: 1.PyTorch Channels: The step ""Get PyTorch Channel"" determines the URL for either the test or nightly PyTorch builds hosted on ""https://download.pytorch.org/"". This URL is later used in the ""Install dependencies"" step to install PyTorch. 2.GitHub: The step ""Check out source repository"" uses the actions/checkout@v4 GitHub Action to fetch the source code of the repository. 3.PyPI (Python Package Index): Several steps in the workflow involve installing Python packages using pip, which fetches packages from PyPI. Any of these could potentially cause a hang if the service is down, or there's an issue with the package/tool being fetched.

image image image image

Please let us know in case of any further clarifications needed.

aparnajyothi-y avatar May 20 '24 10:05 aparnajyothi-y

Hi @aparnajyothi-y thanks for trying to repro. I think the issue is that there is no clear error message or way to debug this as far as I can tell. eg I have no idea what the container is doing, if the failure is eg. due to a timeout, if it's a timeout, how long is it? Or is it an OOM? It's really difficult to try and debug without anything to go on

andrewkho avatar May 20 '24 18:05 andrewkho

Hello @andrewkho, to help investigate the error message, could you please enable debug logs and run the workflow? You can follow the steps in this document to do so. Once done, kindly update the link to the repository with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.

aparnajyothi-y avatar May 27 '24 04:05 aparnajyothi-y

Hello @andrewkho, Could you share the link of the workflow run with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.

aparnajyothi-y avatar Jun 20 '24 12:06 aparnajyothi-y

Hello @andrewkho, Could you share the link of the workflow run with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.

aparnajyothi-y avatar Jul 03 '24 13:07 aparnajyothi-y

Hello @andrewkho, Proceeding to close this after two reminders as we didn't hear anything from along time.

Please feel free to reach us to reopen this issue in case of any further support/ clarifications needed. Thank you.

aparnajyothi-y avatar Jul 10 '24 07:07 aparnajyothi-y