build
build copied to clipboard
Enable Windows Arm64 tests in CI
Node CI had a pipeline for testing Windows Arm64 platform sometimes back but it got disabled as some tests were failing. See https://github.com/nodejs/build/issues/2540#issuecomment-1030991902
Linaro has been looking at reviving the CI job and now we have a new standalone job ( https://ci.nodejs.org/job/windows-arm-simple/ ) that is running with all tests passing.
As discussed during the build meet-up in https://github.com/nodejs/build/issues/2994, Re-enabling the CI again would be crucial to avoid regressions and to officially support the Windows Arm64 platform as requested in https://github.com/nodejs/build/issues/2540
We will need to re-enable the Windows Arm64 test stage to https://ci.nodejs.org/job/node-test-binary-windows-js-suites/
Cc @pbo-linaro, @joaocgreis, @mhdawson, @sxa
Moving the email thread to GitHub issue for visibility and tracking
Job windows-arm-simple is now set to run daily, to evaluate stability in time. For now, build for arm64 is already done here, but not testing.
@pbo-linaro Last nightly run was failing https://ci.nodejs.org/job/windows-arm-simple/nodes=win10-vs2019-arm64/39/. Could you have a look ?
After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.
Those tests are concerned:
- sequential/test-watch-mode-inspect.mjs
- sequential/test-watch-mode.mjs
After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.
Those tests are concerned:
- sequential/test-watch-mode-inspect.mjs
- sequential/test-watch-mode.mjs
Thanks, @pbo-linaro ! Is it possible to fix the tests to not depend on a particular order?
I'm trying to contact author of those commits to see if that is normal, and what we can do with it.
Have you got a link to the relevant commits/PRs for reference - is it the ones that added those tests? And do they have the same problem on win-x64?
The bug theme of this year in multiple projects I've been on has been sorting/ordering problems!
PR:
- https://github.com/nodejs/node/pull/44551
- https://github.com/nodejs/node/pull/44621
@MoLow (author). Do you know if something could create trouble with this?
On my machine, I can make some failure appear by increasing the charge of machine, so something clearly has problem with async dependencies. Why does it appear much more on Windows on Arm machine? Honestly, I don't know, because they are slower, or the presence of fast/slow cores, that create some specific race conditions.
@sxa I suppose win-x64 CI did not fail with this, else those tests should already been marked as flaky. On our WoA machines, we reproduced that with x64 binaries (instead of arm64), so it's not architecture/code specific.
After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.
Those tests are concerned:
- sequential/test-watch-mode-inspect.mjs
- sequential/test-watch-mode.mjs
https://github.com/nodejs/node/issues/44898?
Thanks @richardlau, I missed that one, but that's exactly the issue.
Not sure if it's the correct thread to report this, but I'm getting insufficient disk space to complete link on https://ci.nodejs.org/job/node-compile-windows/47458/nodes=win-vs2019-arm64/, which blocks https://github.com/nodejs/node/pull/44250 from landing.
@aduh95 It is not built on an arm64 machine, but on this agent: test-rackspace-win2012r2_vs2019-x64-1
you can probably open another issue, or simply ask on nodejs slack build channel, if you have access to it.
ARM64 Windows was never enabled by default in the main node-test-* jobs. It has been there as an option but has to be explicitly selected.
I'm not sure we can enable it by default for all runs with the current resources we have now. The machines we have might be enough, but I expect many jobs will queue up when CI is under load. Also, if the machines fail, it would be good to have someone around who knows how to disable only ARM64, without disabling the whole Windows job.
I suggest we try enabling it, but be ready to disable it quickly if it starts causing issues. We only need to edit the main fanned job and change the default value of RUN_ARM64_TESTS to be set by default.
If tests are failing on master, at some point the procedure was to mark them flaky in https://github.com/nodejs/node/blob/main/test/sequential/sequential.status until a fix lands.
I tried running CI for v18.11.0 with ARM64 enabled, but it didn't run any of the relevant test-binary configurations. For example for https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/: I see RUN_ARM64_TESTS is true in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/parameters/ but it is considered false in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/console . I checked the safeParameters list and it was not there, so I added it. I believe Jenkins needs to be restarted to update that variable, I'll do that over the weekend if it doesn't happen before. Anyway, unless that variable was recently updated, the cause might be anything else.
That would be very nice to start running them. We can try for 24 hours, and see if jobs queue gets larger and larger.
For flaky tests reported here, I created this PR https://github.com/nodejs/node/pull/45049. Different platforms reported problems, so I assume it was good to mark them flaky for all of them.
@joaocgreis To clarify things a bit more, neither @nsait-linaro nor me have access to what you describe, so that would be nice if you could change that value to enable tests for this platform.
Sounds good. Let's start the run and we can keep an eye on the load and stability.
We can add more machines/agents if more resources are required.
@pbo-linaro I guess we have to wait for your PR https://github.com/nodejs/node/pull/45049 to land before setting RUN_ARM64_TESTS. @joaocgreis or @sxa - I hope you can help with that.
Thanks to https://github.com/nodejs/node/pull/45049 merged, the windows-arm-simple is now green again! 👍
Can we start enabling official job for this platform (despite current limitation with only two machines available)? Thanks
Builds have been working fine for the last few days, I think we should enable it and see it in action.
@joaocgreis @sxa Could you please help to enable it ?
Hi - yes I've been tied up with release work elsewhere but I should be able to look at this in the next few days, under the caveat that if there are any problems with it the first course of action will likely be to disable it again immediately to avoid any problems with contributors being able to run PR tests :-)
Yes sure, we don't want to block anything for contributors! The windows-arm-simple job has been green since we fixed it, so it should be fine.
Just to give an update from my side, I still don't understand why the test-binary jobs don't invoke any of the ARM configurations, the fix I tried above didn't work. Enabling the checkbox by default will not do anything until we're able to fix this.
@joaocgreis Is there someone responsible for NodeJS jenkins instance that could help to debug this configuration?
I'm back from vacation now :-) I had a shot at re-enabling in the CI and it's running at https://ci.nodejs.org/job/node-compile-windows/nodes=win-vs2019-arm64/ albeit a bit slow so is causing some delays (I just re-enabled the cross-compiled jobs that were already in the system to get something running) but that should be resolved when https://github.com/nodejs/node/pull/45432 goes in.
We should probably also persue native builds (consistent with what we've been doing in https://ci.nodejs.org/job/windows-arm-simple/) potentially as a separate pipeline for now so should work on getting https://ci.nodejs.org/job/node-test-commit-windows-arm running and use that as our primary build+test pipeline for windows/arm instead of the cross-compilation now.
Thanks @sxa. Is there anything we can do to help with this? Seems like its jenkins configuration work, for which we don't have access for now.
We should probably also persue native builds (consistent with what we've been doing in https://ci.nodejs.org/job/windows-arm-simple/) potentially as a separate pipeline for now so should work on getting https://ci.nodejs.org/job/node-test-commit-windows-arm running and use that as our primary build+test pipeline for windows/arm instead of the cross-compilation now.
Yes, who has the privilege to set it up?
@sxa @joaocgreis @pbo-linaro @StefanStojanovic Can we have a sync call to figure out the next steps for this? This has been pending for quite some time and I think we could probably find a way forward with a short sync-up.
There was a bug which hadn't been spotted from when we upgraded the jenkins server to Java 11 that prevented the RUN_ARM64_TESTS flag from being honoured. I've got https://github.com/nodejs/build/pull/3097/files that will fix it, then we can do some runs of it using the existing "fanned" jobs - I'm running some of those just now in a private job to verify that they work
Second step is to try and get this running as a "normal" (non-fanned) combined build+test job like we do on most of the other platforms, which is what node-test-commit-window-arm will do - with the limited amount of machines here, and certainly not the multitude of windows versions that we have for x64 it doesn't make too much sense to use the fanned jobs.
It's likely to be trivial stuff to fix given that we've got this all working in the simple job (the current thing that's stopping it is the absence of JENKINS_TMP_KEY which https://github.com/nodejs/build/blob/86235552f41058f1b0e5f33f66403ec3ee7fea49/jenkins/scripts/windows/git-checkout.cmd#L12 expects.
@joaocgreis @StefanStojanovic I'm guessing we don't have a natural path through the Windows build scripts that be activated for a combined build+test at the moment (e.g. excluding the need for JENKINS_TMP_KEY?
@sxa great work fixing the RUN_ARM64_TESTS issue! Thanks for getting to that! With that in place, we can just go ahead and toggle the default on the checkbox.
Second step is to try and get this running as a "normal" (non-fanned) combined build+test job like we do on most of the other platforms, which is what node-test-commit-window-arm will do - with the limited amount of machines here, and certainly not the multitude of windows versions that we have for x64 it doesn't make too much sense to use the fanned jobs.
The goal of this is to test native compilation right? I see cross-compilation as more important because it's what will be used to build releases. Unless we add machines to release as well, but I don't know any advantage of doing that. IIUC, final binaries should be similar, either cross or natively compiled.
Cross-compilation will be fully tested as soon as we make the RUN_ARM64_TESTS variable true by default. No other changes are needed.
For native compilation, the easiest path for now would be to add a new job (node-test-commit-windows-arm?) directly under node-test-commit, running side-by-side with windows-fanned. Eventually, we'll need more Win/VS versions so moving to fanned would be good, but for now it's ok. The only concern I have with this is that we don't have enough ARM64 machines, so this might have a huge impact on CI run time, because the test jobs of the fanned job will be using the same machines. Can we get the new job ready but hold on adding to node-test-commit until we see how the fanned job experiment goes?