wpt icon indicating copy to clipboard operation
wpt copied to clipboard

Test new Generic Worker pool (do not merge to master)

Open petemoore opened this issue 1 year ago • 9 comments

This tests the task graph under Generic Worker using the new pool created in https://github.com/taskcluster/community-tc-config/pull/775.

Note, it is not intended to merge this PR, only to see if all the tests pass successfully.

If all goes well, we can just replace the old pool with the new one.

petemoore avatar Jul 02 '24 14:07 petemoore

Well that didn't get very far 🤣

+ /home/test/start.sh https://github.com/web-platform-tests/wpt.git refs/pull/46967/merge
+ REMOTE=https://github.com/web-platform-tests/wpt.git
+ REF=refs/pull/46967/merge
+ cd /home/test
+ '[' -e /dev/kvm ']'
+ sudo chmod a+rw /dev/kvm
chmod: changing permissions of '/dev/kvm': Operation not permitted
+ exit_code=1

petemoore avatar Jul 02 '24 14:07 petemoore

That comes from the requirement to run the android emulator. I'm not sure what the fxci taskcluster instance is doing but in theory we could copy that approach.

jgraham avatar Jul 03 '24 08:07 jgraham

Trying again, this time manually switching to docker instead of podman: https://community-tc.services.mozilla.com/tasks/Vv9PGqkdSTKzAdnEG_7blQ

petemoore avatar Jul 05 '24 07:07 petemoore

After fixing payload, we see that the task ran successfully once switching from podman to docker:

https://community-tc.services.mozilla.com/tasks/byFoAENQRbCmai1EjqVhHg

The generated tasks were put in a different task group to the decision task, but that is probably just a consequence of the manually-created decision task having a different taskGroupId to its taskId (probably the decision task sets the Task Group ID of generated tasks to its own Task ID, rather than its own Task Group ID, which I don't think is a good idea, but is probably a bug in Task Graph rather than related to this issue).

However, now I am curious if it might be related to the --privileged argument being passed to podman (which was added in https://github.com/taskcluster/taskcluster/pull/6891). So now running again under podman, this time removing --privileged: https://community-tc.services.mozilla.com/tasks/ZEeRJCWZTYOmiFWfPY3mRQ.

petemoore avatar Jul 05 '24 08:07 petemoore

OK, that indeed fixed the issue. So https://github.com/taskcluster/taskcluster/issues/6891 has inadvertently broken the WPT tasks. 😬

I suspect the best solution here is to find a different solution to https://github.com/taskcluster/taskcluster/pull/6890.

petemoore avatar Jul 05 '24 08:07 petemoore

OK, that indeed fixed the issue. So taskcluster/taskcluster#6891 has inadvertently broken the WPT tasks. 😬

Reverting in https://github.com/taskcluster/taskcluster/pull/7127

petemoore avatar Jul 08 '24 22:07 petemoore

Testing under Generic Worker 67.1.0 with docker worker payload here: https://community-tc.services.mozilla.com/tasks/Hoh_ZmsCSAu8EZ-cPPtR7w

petemoore avatar Jul 16 '24 09:07 petemoore

@jgraham In the test migration from docker worker to generic worker, a couple of failures presented in the task group: https://community-tc.services.mozilla.com/tasks/groups/Hoh_ZmsCSAu8EZ-cPPtR7w Are you familiar with these errors, are they of concern, or are they known issues? It may be a consequence of migrating from docker to podman, if they are new, we can experiment with using docker instead of podman, to see if that resolves the issue. Thanks!

petemoore avatar Jul 16 '24 17:07 petemoore

Looks like we saw some Chrome crashes in the infra tests. I don't know of those already, and Chrome is pretty sensitive to the container environment because of the way its sandboxing works, so it could be related.

A useful thing to do here would be to force-push these changes to the triggers/firefox_nightly and triggers/chrome_canary branches to get full runs of those browsers that we could use to look for regressions.

jgraham avatar Jul 17 '24 10:07 jgraham