ecs-conex Use cached layers if repository has a yarn lockfile

trafficstars

Right now, images are built with the --no-cache flag. Using cached layers is a way to significantly decrease build times, and long build times are one of the biggest bummers about our current CI flow.

One of the arguments pro --no-cache is that without it, npm install with semver version identifiers in package.json could lead to images that use old cached layers for node.js dependencies. This could lead to unexpected (and very non-deterministic) mismatches between your local environment and your production environment.

Yarn's use of a lockfile that pins node.js dependency versions and is committed in the repo avoids this misstep, and makes me wonder if we could drop the --no-cache flag if there's yarn file in the repo.

However there are still a few other questions to weigh against such a decision:

You would only get build-time caching benefits sometimes and not all the time. This depends on whether your conex worker task lands on an EC2 that still has the cached layers from a previous build.
Due to ^^, you'd may want to try and keep cached layers laying around on the EC2s for longer, and this leads to disk space management problems.

It may be worth exploring this anyways, without adjusting anything about how we have our EC2s clean up old images/layers. If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

cc @springmeyer @scothis @mcwhittemore @mapsam @GretaCB

Feb 09 '17 18:02 rclark

You would only get build-time caching benefits sometimes and not all the time

Personally this doesn't feel like a reason to not do it, but I can see why it would be confusing if ecs-conex times start varying wildly.

disk space management problems

This would impact the entire EC2 right? So any task running on this EC2 could be affected, not just the service which has started caching images? Which services currently use up the most disk space?

Feb 09 '17 18:02 mapsam

Yes, disk space is an EC2-wide problem. And controls on disk space are really outside the scope of ecs-conex's responsibilities, except that at present, conex explicitly cleans up after itself, removing everything about the image that it just built, in an effort to reduce impact on any other services that might be sharing the host EC2s.

Feb 09 '17 18:02 rclark

If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

The move to yarn itself will help a lot with node.js dependencies so if that's a requirement and there is a downside to doing this, we might want to see what the gains from using yarn are first.

Feb 09 '17 20:02 mcwhittemore

Thanks for writing this up and providing a view into the state of things.

non-deterministic builds feels like a major drawback.

If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

My feeling is that we should continue to focus on trimming the trees and unix package dependencies as a way to speed things up. There are still a lot of duplicate npm packages happening in projects and we can move to using more mason package in place of apt deps to drop weight.

Feb 10 '17 00:02 springmeyer

That's a good point @mcwhittemore - yarn will definitely bring in some big savings, esp around duplicate npm packages.

Feb 10 '17 23:02 mapsam

There are still a lot of duplicate npm packages happening in projects

Agreed. This is one benefit of yarn. Another is that it is deterministic unlike npm due to its yarn's lockfile.

Feb 13 '17 16:02 mcwhittemore

One of the arguments pro --no-cache is that without it, npm install with semver version identifiers in package.json could lead to images that use old cached layers for node.js dependencies. This could lead to unexpected (and very non-deterministic) mismatches between your local environment and your production environment.

What about being able to specify whether caching should be enabled or not in a package.json or .ecs-conex file?

A simple way for caching would be to download the image of the previous commit and then run the build without no-cache.

Mar 03 '17 16:03 lukasmartinelli

download the image of the previous commit and then run the build without no-cache

I believe we originally did this, and then removed it once we implemented --no-cache. Worth exploring for sure, though you'd want to some way to make sure that downloading time isn't going to be a significant penalty.

Mar 03 '17 16:03 rclark

IMHO it is the responsibility of the image creator that it always yields consistent results - not of the build infrastructure.

The long wait times for a image build slow down iteration time a lot.

I believe we originally did this, and then removed it once we implemented --no-cache. Worth exploring for sure, though you'd want to some way to make sure that downloading time isn't going to be a significant penalty.

In my experience fetching from NPM or apt is usually always slower than downloading the compressed filesystem layer from the internal network (even though AWS provides own mirrors).

Mar 16 '17 14:03 lukasmartinelli

I think that the next action here is a PR to download previous images before a build, and then gather some metrics on build durations to confirm that we're seeing a benefit. Once #94 lands, the duration of each build will be captured in CloudWatch automatically (new watchbot feature).

In terms of consistency between builds, I'm comfortable using the existence of a yarn lockfile as a queue that we should download a prior image and build off of it. I agree with the sentiment that images should handle this themselves @lukasmartinelli, but the reality of our npm usage is that we aren't there for most of our builds.

Mar 16 '17 16:03 rclark

:hugs: I still really really want faster image builds! Give us caching platform overlords :hugs:

May 02 '17 20:05 lukasmartinelli

1oav1r

May 02 '17 20:05 lukasmartinelli

@lukasmartinelli let me just clone myself real quick. brb.

May 03 '17 21:05 rclark

@lukasmartinelli let me just clone myself real quick. brb.

Imagine if we had three clarks!! 🤗 ❤️

May 03 '17 22:05 lukasmartinelli

🤗 🤗 🤗

Bump. I think this will lead to a lot more developer productivity 🔨 + saved ⏲ => saved 💵 + more dev 😄.

Especially when iterating on CloudFormation templates. When one edits the CloudFormation files the Docker image shouldn't need to be rebuild.

Jun 22 '17 20:06 lukasmartinelli

Per chat, this is not something that is likely to move in the next couple months.

Jun 22 '17 20:06 emilymcafee

ecs-conex ecs-conex copied to clipboard

Use cached layers if repository has a yarn lockfile

ecs-conex
ecs-conex copied to clipboard