build icon indicating copy to clipboard operation
build copied to clipboard

Enable Windows Arm64 tests in CI

Open niyas-sait opened this issue 3 years ago • 13 comments

Node CI had a pipeline for testing Windows Arm64 platform sometimes back but it got disabled as some tests were failing. See https://github.com/nodejs/build/issues/2540#issuecomment-1030991902

Linaro has been looking at reviving the CI job and now we have a new standalone job ( https://ci.nodejs.org/job/windows-arm-simple/ ) that is running with all tests passing.

As discussed during the build meet-up in https://github.com/nodejs/build/issues/2994, Re-enabling the CI again would be crucial to avoid regressions and to officially support the Windows Arm64 platform as requested in https://github.com/nodejs/build/issues/2540

We will need to re-enable the Windows Arm64 test stage to https://ci.nodejs.org/job/node-test-binary-windows-js-suites/

niyas-sait avatar Oct 05 '22 08:10 niyas-sait

Cc @pbo-linaro, @joaocgreis, @mhdawson, @sxa

Moving the email thread to GitHub issue for visibility and tracking

niyas-sait avatar Oct 05 '22 08:10 niyas-sait

Job windows-arm-simple is now set to run daily, to evaluate stability in time. For now, build for arm64 is already done here, but not testing.

pbo-linaro avatar Oct 05 '22 10:10 pbo-linaro

@pbo-linaro Last nightly run was failing https://ci.nodejs.org/job/windows-arm-simple/nodes=win10-vs2019-arm64/39/. Could you have a look ?

niyas-sait avatar Oct 05 '22 11:10 niyas-sait

After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.

Those tests are concerned:

  • sequential/test-watch-mode-inspect.mjs
  • sequential/test-watch-mode.mjs

pbo-linaro avatar Oct 12 '22 13:10 pbo-linaro

After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.

Those tests are concerned:

  • sequential/test-watch-mode-inspect.mjs
  • sequential/test-watch-mode.mjs

Thanks, @pbo-linaro ! Is it possible to fix the tests to not depend on a particular order?

niyas-sait avatar Oct 12 '22 13:10 niyas-sait

I'm trying to contact author of those commits to see if that is normal, and what we can do with it.

pbo-linaro avatar Oct 12 '22 13:10 pbo-linaro

Have you got a link to the relevant commits/PRs for reference - is it the ones that added those tests? And do they have the same problem on win-x64?

The bug theme of this year in multiple projects I've been on has been sorting/ordering problems!

sxa avatar Oct 12 '22 14:10 sxa

PR:

  • https://github.com/nodejs/node/pull/44551
  • https://github.com/nodejs/node/pull/44621

@MoLow (author). Do you know if something could create trouble with this?

On my machine, I can make some failure appear by increasing the charge of machine, so something clearly has problem with async dependencies. Why does it appear much more on Windows on Arm machine? Honestly, I don't know, because they are slower, or the presence of fast/slow cores, that create some specific race conditions.

pbo-linaro avatar Oct 12 '22 14:10 pbo-linaro

@sxa I suppose win-x64 CI did not fail with this, else those tests should already been marked as flaky. On our WoA machines, we reproduced that with x64 binaries (instead of arm64), so it's not architecture/code specific.

pbo-linaro avatar Oct 12 '22 14:10 pbo-linaro

After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.

Those tests are concerned:

  • sequential/test-watch-mode-inspect.mjs
  • sequential/test-watch-mode.mjs

https://github.com/nodejs/node/issues/44898?

richardlau avatar Oct 12 '22 14:10 richardlau

Thanks @richardlau, I missed that one, but that's exactly the issue.

pbo-linaro avatar Oct 12 '22 14:10 pbo-linaro

Not sure if it's the correct thread to report this, but I'm getting insufficient disk space to complete link on https://ci.nodejs.org/job/node-compile-windows/47458/nodes=win-vs2019-arm64/, which blocks https://github.com/nodejs/node/pull/44250 from landing.

aduh95 avatar Oct 13 '22 15:10 aduh95

@aduh95 It is not built on an arm64 machine, but on this agent: test-rackspace-win2012r2_vs2019-x64-1

you can probably open another issue, or simply ask on nodejs slack build channel, if you have access to it.

pbo-linaro avatar Oct 13 '22 15:10 pbo-linaro

ARM64 Windows was never enabled by default in the main node-test-* jobs. It has been there as an option but has to be explicitly selected.

I'm not sure we can enable it by default for all runs with the current resources we have now. The machines we have might be enough, but I expect many jobs will queue up when CI is under load. Also, if the machines fail, it would be good to have someone around who knows how to disable only ARM64, without disabling the whole Windows job.

I suggest we try enabling it, but be ready to disable it quickly if it starts causing issues. We only need to edit the main fanned job and change the default value of RUN_ARM64_TESTS to be set by default.

If tests are failing on master, at some point the procedure was to mark them flaky in https://github.com/nodejs/node/blob/main/test/sequential/sequential.status until a fix lands.

I tried running CI for v18.11.0 with ARM64 enabled, but it didn't run any of the relevant test-binary configurations. For example for https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/: I see RUN_ARM64_TESTS is true in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/parameters/ but it is considered false in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/console . I checked the safeParameters list and it was not there, so I added it. I believe Jenkins needs to be restarted to update that variable, I'll do that over the weekend if it doesn't happen before. Anyway, unless that variable was recently updated, the cause might be anything else.

joaocgreis avatar Oct 17 '22 22:10 joaocgreis

That would be very nice to start running them. We can try for 24 hours, and see if jobs queue gets larger and larger.

For flaky tests reported here, I created this PR https://github.com/nodejs/node/pull/45049. Different platforms reported problems, so I assume it was good to mark them flaky for all of them.

pbo-linaro avatar Oct 18 '22 07:10 pbo-linaro

@joaocgreis To clarify things a bit more, neither @nsait-linaro nor me have access to what you describe, so that would be nice if you could change that value to enable tests for this platform.

pbo-linaro avatar Oct 18 '22 08:10 pbo-linaro

Sounds good. Let's start the run and we can keep an eye on the load and stability.

We can add more machines/agents if more resources are required.

@pbo-linaro I guess we have to wait for your PR https://github.com/nodejs/node/pull/45049 to land before setting RUN_ARM64_TESTS. @joaocgreis or @sxa - I hope you can help with that.

niyas-sait avatar Oct 18 '22 08:10 niyas-sait

Thanks to https://github.com/nodejs/node/pull/45049 merged, the windows-arm-simple is now green again! 👍

Can we start enabling official job for this platform (despite current limitation with only two machines available)? Thanks

pbo-linaro avatar Oct 20 '22 10:10 pbo-linaro

Builds have been working fine for the last few days, I think we should enable it and see it in action.

@joaocgreis @sxa Could you please help to enable it ?

niyas-sait avatar Oct 25 '22 06:10 niyas-sait

Hi - yes I've been tied up with release work elsewhere but I should be able to look at this in the next few days, under the caveat that if there are any problems with it the first course of action will likely be to disable it again immediately to avoid any problems with contributors being able to run PR tests :-)

sxa avatar Nov 08 '22 17:11 sxa

Yes sure, we don't want to block anything for contributors! The windows-arm-simple job has been green since we fixed it, so it should be fine.

pbo-linaro avatar Nov 09 '22 07:11 pbo-linaro

Just to give an update from my side, I still don't understand why the test-binary jobs don't invoke any of the ARM configurations, the fix I tried above didn't work. Enabling the checkbox by default will not do anything until we're able to fix this.

joaocgreis avatar Nov 11 '22 12:11 joaocgreis

@joaocgreis Is there someone responsible for NodeJS jenkins instance that could help to debug this configuration?

pbo-linaro avatar Nov 15 '22 08:11 pbo-linaro

I'm back from vacation now :-) I had a shot at re-enabling in the CI and it's running at https://ci.nodejs.org/job/node-compile-windows/nodes=win-vs2019-arm64/ albeit a bit slow so is causing some delays (I just re-enabled the cross-compiled jobs that were already in the system to get something running) but that should be resolved when https://github.com/nodejs/node/pull/45432 goes in.

We should probably also persue native builds (consistent with what we've been doing in https://ci.nodejs.org/job/windows-arm-simple/) potentially as a separate pipeline for now so should work on getting https://ci.nodejs.org/job/node-test-commit-windows-arm running and use that as our primary build+test pipeline for windows/arm instead of the cross-compilation now.

sxa avatar Nov 15 '22 12:11 sxa

Thanks @sxa. Is there anything we can do to help with this? Seems like its jenkins configuration work, for which we don't have access for now.

pbo-linaro avatar Nov 15 '22 16:11 pbo-linaro

We should probably also persue native builds (consistent with what we've been doing in https://ci.nodejs.org/job/windows-arm-simple/) potentially as a separate pipeline for now so should work on getting https://ci.nodejs.org/job/node-test-commit-windows-arm running and use that as our primary build+test pipeline for windows/arm instead of the cross-compilation now.

Yes, who has the privilege to set it up?

niyas-sait avatar Nov 16 '22 09:11 niyas-sait

@sxa @joaocgreis @pbo-linaro @StefanStojanovic Can we have a sync call to figure out the next steps for this? This has been pending for quite some time and I think we could probably find a way forward with a short sync-up.

niyas-sait avatar Nov 21 '22 10:11 niyas-sait

There was a bug which hadn't been spotted from when we upgraded the jenkins server to Java 11 that prevented the RUN_ARM64_TESTS flag from being honoured. I've got https://github.com/nodejs/build/pull/3097/files that will fix it, then we can do some runs of it using the existing "fanned" jobs - I'm running some of those just now in a private job to verify that they work

sxa avatar Nov 24 '22 18:11 sxa

Second step is to try and get this running as a "normal" (non-fanned) combined build+test job like we do on most of the other platforms, which is what node-test-commit-window-arm will do - with the limited amount of machines here, and certainly not the multitude of windows versions that we have for x64 it doesn't make too much sense to use the fanned jobs.

It's likely to be trivial stuff to fix given that we've got this all working in the simple job (the current thing that's stopping it is the absence of JENKINS_TMP_KEY which https://github.com/nodejs/build/blob/86235552f41058f1b0e5f33f66403ec3ee7fea49/jenkins/scripts/windows/git-checkout.cmd#L12 expects.

@joaocgreis @StefanStojanovic I'm guessing we don't have a natural path through the Windows build scripts that be activated for a combined build+test at the moment (e.g. excluding the need for JENKINS_TMP_KEY?

sxa avatar Nov 24 '22 18:11 sxa

@sxa great work fixing the RUN_ARM64_TESTS issue! Thanks for getting to that! With that in place, we can just go ahead and toggle the default on the checkbox.

Second step is to try and get this running as a "normal" (non-fanned) combined build+test job like we do on most of the other platforms, which is what node-test-commit-window-arm will do - with the limited amount of machines here, and certainly not the multitude of windows versions that we have for x64 it doesn't make too much sense to use the fanned jobs.

The goal of this is to test native compilation right? I see cross-compilation as more important because it's what will be used to build releases. Unless we add machines to release as well, but I don't know any advantage of doing that. IIUC, final binaries should be similar, either cross or natively compiled.

Cross-compilation will be fully tested as soon as we make the RUN_ARM64_TESTS variable true by default. No other changes are needed.

For native compilation, the easiest path for now would be to add a new job (node-test-commit-windows-arm?) directly under node-test-commit, running side-by-side with windows-fanned. Eventually, we'll need more Win/VS versions so moving to fanned would be good, but for now it's ok. The only concern I have with this is that we don't have enough ARM64 machines, so this might have a huge impact on CI run time, because the test jobs of the fanned job will be using the same machines. Can we get the new job ready but hold on adding to node-test-commit until we see how the fanned job experiment goes?

joaocgreis avatar Nov 25 '22 03:11 joaocgreis