builder icon indicating copy to clipboard operation
builder copied to clipboard

VS installer exited with code -1 flakily when building Windows binaries

Open huydhn opened this issue 1 year ago • 5 comments

I'm currently seeing quite a number of flaky failures when building Windows binaries in trunk, for example https://github.com/pytorch/pytorch/actions/runs/4744014597/jobs/8424319388

The error is pointing to this step https://github.com/pytorch/builder/blob/main/windows/internal/vs2022_install.ps1#L42 in which vs_installer.exe is installed. The exact error is VS installer exited with code -1, which should be one of [0, 3010]. I have already tried to disabled Windows Defender there (https://github.com/pytorch/pytorch/pull/99389) but it doesn't seem to help.

Another minor bug is when vslogs.zip is copied at https://github.com/pytorch/builder/blob/main/windows/internal/vs2022_install.ps1#L54. The correct path should be C:\Users\${env:USERNAME}\AppData\Local\Temp\vslogs.zip as the user is now runneruser instead of circleci. This hides the above error.

cc @atalman @malfet @Blackhex

huydhn avatar Apr 19 '23 16:04 huydhn

VS2022 should be part of AMI, sholdn't it?

malfet avatar Apr 19 '23 16:04 malfet

It looks like there is a gap here. The installation script used by the AMI https://github.com/pytorch/test-infra/blob/main/aws/ami/windows/scripts/Installers/Install-VS.ps1#L34 looks older and still uses VS2019. Thus it makes sense that VS2022 is installed every time

huydhn avatar Apr 19 '23 17:04 huydhn

Note there is a PR that should update the VS on the AMI pending pytorch/test-infra#1175. I haven't touched it for a while but I can revive it if needed.

Blackhex avatar Apr 19 '23 17:04 Blackhex

Also note, that thre might be a bug in collecting the VS logs that would be helpfull for reporting the issue:

The workflow compresses the logs into C:\Users\runneruser\AppData\Local\Temp\vslogs.zip file but then copy commad fails with:

Copy-Item : Cannot find path 'C:\Users\circleci\AppData\Local\Temp\vslogs.zip' because it does not exist. 

Blackhex avatar Apr 19 '23 17:04 Blackhex

To summary my chat with @malfet on the issue:

  1. Does this issue only happen with VS2022? If yes, could we rollback to use VS2019 for the time being as it matches with what is currently in the AMI?
  2. Eventually we can use VS2022, but it would need to be part of the AMI (https://github.com/pytorch/test-infra/pull/1175). cc @atalman I remember that you are testing a new Windows AMI, is this possible to include this change too?

huydhn avatar Apr 19 '23 19:04 huydhn