setup-python
setup-python copied to clipboard
Intermittent failures during Post Setup Python step for MacOS
I'm new to Github Actions and I'm having trouble understanding this failure, apologies if this isn't the right way to flag the issue.
Description: Post Setup Python fails intermittently with macos-latest. On successful runs it's much slower to clean up / shut down than windows and linux.
Action version: Tested with Actions v3/v4 and setup-python v4/v5
Platform:
- [ ] Ubuntu
- [x] macOS
- [ ] Windows
Runner type:
- [x] Hosted
- [ ] Self-hosted
Tools version: 3.8, 3.9, 3.10
Repro steps:
The original workflow yaml is here: https://github.com/pytorch/data/blob/main/.github/workflows/stateful_dataloader_ci.yml
In this failed run I tried updating actions from v3 -> v4 and setup-python from v4 -> v5, and it still exhibits the behaviour: Example of failed run: https://github.com/pytorch/data/actions/runs/8903946672/job/24452473208?pr=1249 Failed retry with debug logs: https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388
##[debug]Evaluating condition for step: 'Post Setup Python 3.9'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Post Setup Python [3](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:3).9
##[debug]Loading inputs
##[debug]Evaluating: matrix.python-version
##[debug]Evaluating Index:
##[debug]..Evaluating matrix:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'python-version'
##[debug]=> 3.[9](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:9)
##[debug]Result: 3.9
##[debug]Evaluating: (((github.server_url == 'https://github.com') && github.token) || '')
##[debug]Evaluating Or:
##[debug]..Evaluating And:
##[debug]....Evaluating Equal:
##[debug]......Evaluating Index:
##[debug]........Evaluating github:
##[debug]........=> Object
##[debug]........Evaluating String:
##[debug]........=> 'server_url'
##[debug]......=> 'https://github.com/'
##[debug]......Evaluating String:
##[debug]......=> 'https://github.com'
##[debug]....=> true
##[debug]....Evaluating Index:
##[debug]......Evaluating github:
##[debug]......=> Object
##[debug]......Evaluating String:
##[debug]......=> 'token'
##[debug]....=> '***'
##[debug]..=> '***'
##[debug]=> '***'
##[debug]Expanded: ((('https://github.com/' == 'https://github.com') && '***') || '')
##[debug]Result: '***'
##[debug]Loading env
Post job cleanup.
##[debug]Re-evaluate condition on job cancellation for step: 'Post Setup Python 3.9'.
Expected behavior: Expect Post Setup-Python to finish quickly and succeed.
Actual behavior: Post Setup-Python hangs and marks the run as failed.
Hello @andrewkho Thank you for creating this issue. We will investigate it and get back to you as soon as we have some feedback.
Hello @andrewkho, we have investigated the issue and we are not able to reproduce it with actions/setup-python@v3,v4,v5. Please find the screenshots for reference. We have noticed in the provided run in this issue that post checkout job isn't terminating as expected. It might be due to an external service not responding as expected, causing the job to hang. Moreover, the workflow provided does interact with a few external services: 1.PyTorch Channels: The step ""Get PyTorch Channel"" determines the URL for either the test or nightly PyTorch builds hosted on ""https://download.pytorch.org/"". This URL is later used in the ""Install dependencies"" step to install PyTorch. 2.GitHub: The step ""Check out source repository"" uses the actions/checkout@v4 GitHub Action to fetch the source code of the repository. 3.PyPI (Python Package Index): Several steps in the workflow involve installing Python packages using pip, which fetches packages from PyPI. Any of these could potentially cause a hang if the service is down, or there's an issue with the package/tool being fetched.
Please let us know in case of any further clarifications needed.
Hi @aparnajyothi-y thanks for trying to repro. I think the issue is that there is no clear error message or way to debug this as far as I can tell. eg I have no idea what the container is doing, if the failure is eg. due to a timeout, if it's a timeout, how long is it? Or is it an OOM? It's really difficult to try and debug without anything to go on
Hello @andrewkho, to help investigate the error message, could you please enable debug logs and run the workflow? You can follow the steps in this document to do so. Once done, kindly update the link to the repository with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.
Hello @andrewkho, Could you share the link of the workflow run with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.
Hello @andrewkho, Could you share the link of the workflow run with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.
Hello @andrewkho, Proceeding to close this after two reminders as we didn't hear anything from along time.
Please feel free to reach us to reopen this issue in case of any further support/ clarifications needed. Thank you.