site-kit-wp icon indicating copy to clipboard operation
site-kit-wp copied to clipboard

Debug: VRT docker image

Open benbowler opened this issue 1 year ago • 6 comments

Summary

Addresses issue:

  • #9528

Relevant technical choices

WIP switch to Ubuntu/Debian as Chromium isn't officially supported on Alpine. I'm going to run the VRTs job over and over on this branch to see if it fails or not across different times of day.

Also based on a suggestion from @techanvil I lock the version of Chromium installed by apt so that it's consistent over time.

Notes below on updates and continued testing...

PR Author Checklist

  • [x] My code is tested and passes existing unit tests.
  • [ ] My code has an appropriate set of unit tests which all pass.
  • [x] My code is backward-compatible with WordPress 5.2 and PHP 7.4.
  • [x] My code follows the WordPress coding standards.
  • [ ] My code has proper inline documentation.
  • [ ] I have added a QA Brief on the issue linked above.
  • [x] I have signed the Contributor License Agreement (see https://cla.developers.google.com/).

Do not alter or remove anything below. The following sections will be managed by moderators only.

Code Reviewer Checklist

  • [ ] Run the code.
  • [ ] Ensure the acceptance criteria are satisfied.
  • [ ] Reassess the implementation with the IB.
  • [ ] Ensure no unrelated changes are included.
  • [ ] Ensure CI checks pass.
  • [ ] Check Storybook where applicable.
  • [ ] Ensure there is a QA Brief.
  • [ ] Ensure there are no unexpected significant changes to file sizes.

Merge Reviewer Checklist

  • [ ] Ensure the PR has the correct target branch.
  • [ ] Double-check that the PR is okay to be merged.
  • [ ] Ensure the corresponding issue has a ZenHub release assigned.
  • [ ] Add a changelog message to the issue.

benbowler avatar Oct 17 '24 17:10 benbowler

Update 18 October 2024:

Using the ubuntu image created issues with installing a reliable node version, so I changed tack and used the official node Debian image then lock the version of chromium here. One thing to note is the Debian image is ~1.6GB vs ~600MB for the old Alpine image. This adds overhead for the job and local developers the first time they run the VRTs.

I'll continue to run the job now and see if it can complete reliably.

benbowler avatar Oct 18 '24 12:10 benbowler

Update 18 October 2024, I switched back to Alpine on a locked chromium and node version from a suggestion by @techanvil, however the node alpine image for node 14.20 includes a much older version of chromium and I has multiple issues with differences in failures between similar runs so I reverted to Debian and will test that image some more.

benbowler avatar Oct 18 '24 12:10 benbowler

@techanvil there are so many ways to slice things, and different base images to use, be it the OS base images then add our own node and chromium, node images then add chromium, I've also tried the puppeteer/puppeteer image this afternoon. Update so far:

  • alpine:3.17 base image (original image that appeared to be fixed but failed to launch but re-occurred).
  • node:alpine has been very flakey.
  • node:debian has been stable with 7 test runs successful across the day (although this adds to the build time and adds 1.2GB to the total docker image size).
  • ghcr.io/puppeteer/puppeteer however we're on puppeteer 10 and the oldest version supported by this image is 16 which is already 2 years old.

I feel like working and updating packages could help here to get us to a more recent node, jest, storybook and puppeteer package versions, this is an ongoing project prime for hackathons as well as the tickets we have such as #9408.

benbowler avatar Oct 18 '24 15:10 benbowler

Thanks @benbowler, interesting to hear the progress to date and it's a good point that updating the various package versions across the board is the preferred long time solution, it's something we need to address on a holistic level across the whole project really.

That said, in the short term we do just want to get a quick fix in to address the pain point of our VRTs failing so regularly in CI. I had imagined trying to figure out what if anything had changed in our current Alpine based image might be the most efficient way to tackle this. But, if it's easier to update the image to use Debian and that gives good results then we can certainly take that route for now, even if it adds a bit of time (I think ~30 seconds?) to creating the image and the image is larger, if it stabilises the test run it's a price worth paying for the overall time it will save us, and we can raise a followup issue to look at finessing this further in whatever way makes the most sense.

I do remain of the opinion we should probably revert to Alpine, but there are of course other options we can look at for speeding things up with a Debian or other image base.

Btw, please don't forget the point I made on my other comment about the Node version, we don't need to pin it to 14 for the Backstop image :)

techanvil avatar Oct 21 '24 09:10 techanvil

Thanks @techanvil, I've updated and tested the node version to node:16.14.2-bullseye and confirm this still appears stable.

I've written up the current state to an IB to put through to IBR officially.

benbowler avatar Oct 21 '24 13:10 benbowler

Thanks @techanvil, I've updated and tested the node version to node:16.14.2-bullseye and confirm this still appears stable.

I've written up the current state to an IB to put through to IBR officially.

Thanks @benbowler! I've unassigned myself from the issue in IBR so someone else can review it if they have capacity, although I will pick it up myself when I get a moment if it's still there to review.

techanvil avatar Oct 21 '24 14:10 techanvil

Closing this in favour of #9559

benbowler avatar Nov 24 '24 02:11 benbowler