ecs-conex icon indicating copy to clipboard operation
ecs-conex copied to clipboard

Use cached layers if repository has a yarn lockfile

Open rclark opened this issue 8 years ago • 16 comments
trafficstars

Right now, images are built with the --no-cache flag. Using cached layers is a way to significantly decrease build times, and long build times are one of the biggest bummers about our current CI flow.

One of the arguments pro --no-cache is that without it, npm install with semver version identifiers in package.json could lead to images that use old cached layers for node.js dependencies. This could lead to unexpected (and very non-deterministic) mismatches between your local environment and your production environment.

Yarn's use of a lockfile that pins node.js dependency versions and is committed in the repo avoids this misstep, and makes me wonder if we could drop the --no-cache flag if there's yarn file in the repo.

However there are still a few other questions to weigh against such a decision:

  • You would only get build-time caching benefits sometimes and not all the time. This depends on whether your conex worker task lands on an EC2 that still has the cached layers from a previous build.

  • Due to ^^, you'd may want to try and keep cached layers laying around on the EC2s for longer, and this leads to disk space management problems.

It may be worth exploring this anyways, without adjusting anything about how we have our EC2s clean up old images/layers. If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

cc @springmeyer @scothis @mcwhittemore @mapsam @GretaCB

rclark avatar Feb 09 '17 18:02 rclark

You would only get build-time caching benefits sometimes and not all the time

Personally this doesn't feel like a reason to not do it, but I can see why it would be confusing if ecs-conex times start varying wildly.

disk space management problems

This would impact the entire EC2 right? So any task running on this EC2 could be affected, not just the service which has started caching images? Which services currently use up the most disk space?

mapsam avatar Feb 09 '17 18:02 mapsam

Yes, disk space is an EC2-wide problem. And controls on disk space are really outside the scope of ecs-conex's responsibilities, except that at present, conex explicitly cleans up after itself, removing everything about the image that it just built, in an effort to reduce impact on any other services that might be sharing the host EC2s.

rclark avatar Feb 09 '17 18:02 rclark

If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

The move to yarn itself will help a lot with node.js dependencies so if that's a requirement and there is a downside to doing this, we might want to see what the gains from using yarn are first.

mcwhittemore avatar Feb 09 '17 20:02 mcwhittemore

Thanks for writing this up and providing a view into the state of things.

non-deterministic builds feels like a major drawback.

If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

My feeling is that we should continue to focus on trimming the trees and unix package dependencies as a way to speed things up. There are still a lot of duplicate npm packages happening in projects and we can move to using more mason package in place of apt deps to drop weight.

springmeyer avatar Feb 10 '17 00:02 springmeyer

That's a good point @mcwhittemore - yarn will definitely bring in some big savings, esp around duplicate npm packages.

mapsam avatar Feb 10 '17 23:02 mapsam

There are still a lot of duplicate npm packages happening in projects

Agreed. This is one benefit of yarn. Another is that it is deterministic unlike npm due to its yarn's lockfile.

mcwhittemore avatar Feb 13 '17 16:02 mcwhittemore

One of the arguments pro --no-cache is that without it, npm install with semver version identifiers in package.json could lead to images that use old cached layers for node.js dependencies. This could lead to unexpected (and very non-deterministic) mismatches between your local environment and your production environment.

What about being able to specify whether caching should be enabled or not in a package.json or .ecs-conex file?

A simple way for caching would be to download the image of the previous commit and then run the build without no-cache.

lukasmartinelli avatar Mar 03 '17 16:03 lukasmartinelli

download the image of the previous commit and then run the build without no-cache

I believe we originally did this, and then removed it once we implemented --no-cache. Worth exploring for sure, though you'd want to some way to make sure that downloading time isn't going to be a significant penalty.

rclark avatar Mar 03 '17 16:03 rclark

IMHO it is the responsibility of the image creator that it always yields consistent results - not of the build infrastructure.

The long wait times for a image build slow down iteration time a lot.

I believe we originally did this, and then removed it once we implemented --no-cache. Worth exploring for sure, though you'd want to some way to make sure that downloading time isn't going to be a significant penalty.

In my experience fetching from NPM or apt is usually always slower than downloading the compressed filesystem layer from the internal network (even though AWS provides own mirrors).

lukasmartinelli avatar Mar 16 '17 14:03 lukasmartinelli

I think that the next action here is a PR to download previous images before a build, and then gather some metrics on build durations to confirm that we're seeing a benefit. Once #94 lands, the duration of each build will be captured in CloudWatch automatically (new watchbot feature).

In terms of consistency between builds, I'm comfortable using the existence of a yarn lockfile as a queue that we should download a prior image and build off of it. I agree with the sentiment that images should handle this themselves @lukasmartinelli, but the reality of our npm usage is that we aren't there for most of our builds.

rclark avatar Mar 16 '17 16:03 rclark

:hugs: I still really really want faster image builds! Give us caching platform overlords :hugs:

lukasmartinelli avatar May 02 '17 20:05 lukasmartinelli

1oav1r

lukasmartinelli avatar May 02 '17 20:05 lukasmartinelli

@lukasmartinelli let me just clone myself real quick. brb.

rclark avatar May 03 '17 21:05 rclark

@lukasmartinelli let me just clone myself real quick. brb.

Imagine if we had three clarks!! 🤗 ❤️

lukasmartinelli avatar May 03 '17 22:05 lukasmartinelli

🤗 🤗 🤗

Bump. I think this will lead to a lot more developer productivity 🔨 + saved ⏲ => saved 💵 + more dev 😄.

Especially when iterating on CloudFormation templates. When one edits the CloudFormation files the Docker image shouldn't need to be rebuild.

lukasmartinelli avatar Jun 22 '17 20:06 lukasmartinelli

Per chat, this is not something that is likely to move in the next couple months.

emilymcafee avatar Jun 22 '17 20:06 emilymcafee