cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

CI: Windows GPU runners do not stop on error

Open leofang opened this issue 10 months ago • 1 comments

In this CI run we hit a bizarre NVRTC not found error at test time. However, it should have been properly installed prior to test execution. Turns out that Powershell decides to swallow any pip install failures (this happens because of #482), so we did not install the dependencies (including NVRTC) successfully: https://github.com/NVIDIA/cuda-python/actions/runs/13623016730/job/38075976144?pr=423#step:18:39

It looks like we hit a known runner issue, which was closed without a proper fix: https://github.com/actions/runner-images/issues/6668 (the recommendation there was to switch to the bash shell; I'd love to do this too as it'd allow us to not maintain 2 versions of workflows, however it is not possible for GH-hosted Windows GPU runners)

leofang avatar Mar 03 '25 04:03 leofang

Not sure if setting this in the beginning of a workflow would help:

# Stop the script when a cmdlet or a native command fails
$ErrorActionPreference = 'Stop'
$PSNativeCommandUseErrorActionPreference = $true
  • https://stackoverflow.com/a/9949105/2344149
  • https://www.meziantou.net/stop-the-script-when-an-error-occurs-in-powershell.htm
  • https://github.com/kachick/dotfiles/pull/806

leofang avatar Mar 03 '25 04:03 leofang

Now that we have moved to our own hosted runners we are using bash to execute, and I can see the tests properly failing/exiting as expected. We can open up follow ups for any specific cases. Getting to a common bash runner is helpful for a number of reasons.

cryos avatar Oct 02 '25 18:10 cryos