[Fargate]: New Fargate PV label (request for feedback)
The container service team is considering adding a brand new Fargate platform version (PV) label in addition to LATEST to enhance the user experience of Fargate customers. There are a couple of options we are discussing (in addition to the third option which is “do nothing, keep things as they are”) and we would like to get your feedback re which one would work best for you or we would like to hear any other proposal you may have. Please read the context below and provide feedback about what you would like us to implement.
Introduction
Today Fargate platform versions (PVs) are tagged with a specific version (1.0, 1.1, 1.2, 1.3, 1.4). In addition, a specific PV version is tagged with the label LATEST to allow you to rely on the latest available PV without having to call out and specify its version. It’s important to understand that the LATEST tag is always resolved to the actual PV version when the tasks is deployed. In general, contrary to generic containers best practices where the latest tag for a container image should not be used, using LATEST for a Fargate PV can be considered a good practice. Note this issue only relates to Fargate when used with the ECS orchestrator (when using Fargate with EKS the platform version being used is automatically picked by the EKS platform being chosen). If you want to know more about Fargate PVs read this blog post.
Current situation
Since we launched Fargate, at every new platform version release, we have always moved the LATEST tag immediately. With the last 1.4 release (launched on April 8th) we decided to take a different and more conservative approach of not moving the LATEST tag immediately. We knew that 1.4 included a lot of changes and we wanted to give customers that have set their pipelines to leverage LATEST some runway to test the 1.4 release before deploying it as the new “default” PV. As of early-September we haven’t yet move the LATEST tag to 1.4 and it still points to 1.3. We are planning to move it soon though. While we stand by this decision because it was the right thing to provide a good customer experience, we also are aware that we broke an expectation that many of you had in terms of leveraging the LATEST tag (words have a meaning). It became obvious that there are two sets of mindset out there: one that is willing to take some compatibility risks in return of accessing additional features immediately, and one that is willing to delay the access of new additional features in return of more stability and less risks. Unfortunately, with one label, we can only satisfy one approach at a time.
Proposal We are considering adding a second tag to Fargate platform versions to intercept the two use cases and mindsets above. However, we need to do so in a way that doesn’t break the UX. These are the options under considerations with pros and cons:
- Introduce a new label called
DEFAULT. This becomes the “stable” label andLATESTcan be used to mean what it means. We’d moveLATESTimmediately and keepDEFAULTto the n-1 PV for x months. Then we’d move it and you would point to the same (latest) release. If you do not provide any PV label/tag at service or task deployment time, you will be usingLATEST(current implementation) which means you would be automatically enrolled into the “faster features access” behavior.- Pros:
- The names are meaningful and are self-explanatory
- It’s consistent with what users expects (
LATESTmeans latest)
- Cons:
- It requires customers that prefer a more conservative experience to explicitly opt in to the new label and change their deployments to use
DEFAULT
- It requires customers that prefer a more conservative experience to explicitly opt in to the new label and change their deployments to use
- Pros:
- Introduce a new label called
EDGE. This becomes our “edgy” release and takes on the role that today we have forLATEST.LATESTbecomes our (so called) stable release. We’d moveEDGEimmediately and keepLATESTto the n-1 PV for x months. Then we’d move it and you would point to the same (latest) release. If you do not provide any PV label/tag at service or task deployment time, you will be usingLATEST(current implementation) which means you would be automatically enrolled into the “slower/stable features access” behavior.- Pros:
- It does not require to re-educate users to switch to a different tag if they want to leverage a more conservative approach. If you don’t do anything you are on the slower track, if you want to move faster you need to use the
EDGEtag.
- It does not require to re-educate users to switch to a different tag if they want to leverage a more conservative approach. If you don’t do anything you are on the slower track, if you want to move faster you need to use the
- Cons:
- the names are misleading (
LATESTdoesn’t mean latest) - This would be a more conservative approach (too conservative?) and it may not address those customers expecting
LATESTto be actually latest. EDGEhas a particular meaning in this industry and if we want the next Fargate PV release to do a GA release (and not a preview) theEDGEtag may be misleading.
- the names are misleading (
- Pros:
Bonus question So far we have always introduced Fargate platform versions in general availability (GA). This means that regardless of the labeling strategy above, all new PV introduced would have been production-grade and fully supported. Should we decide, for a potentially new PV in the future, to make it available as a preview, we have a couple of options and we would like to hear your feedback on this as well:
- do not move the tags to this version and keep any of the tags (LATEST + DEFAULT or EDGE) always only tied to GA versions of the PVs. This approach provides a very conservative behavior (no matter what, you know that any tag points to a GA fully supported version) but may miss the vision for having a LATEST or EDGE tag (for customers that may want to always point to the very most recent PV available regardless of supportability status)
- move the tag that is supposed to signal the most recent PV version (LATEST or EDGE depending on which strategy above we may pick) to the preview PV we are introducing. This approach provides a very agile way to stay on the always most recent PV available (regardless of supportability status) but may not provide a consistent behavior (that is, the tag that is supposed to represent the most recent PV available may point to a production GA release or a tech preview depending on the status we want to launch a PV with).
Thank you for the detailed write-up.
We would prefer the EDGE approach as we have a fairly large fargate deployment footprint in our enterprise. As such, it may be difficult for some teams to opt-out of a change to an edge-like experience.
Thanks again for involving the community on this decision!
I think I have a variation on "Option 1" that I'd like to propose:
Have the labels be Stable and Latest
Latest would go back to it's original meaning, and Stable follows the same descriptive pattern. Stable would be the default choice if you don't opt in to using Latest.
One other change I'd suggest from "Option 1" above would be to one-time retroactively convert everyone currently on Latest to Stable, and allow them to opt back in to be on the faster moving path. The workflow would look like this:
- Message all customers affected about what will happen (repeatedly)
- create
Stabletag - s/Latest/Stable/ ${all taskdefs}
- Message all customers affected about what just happened, instructing them to opt back in to
Latestif so desired. - Release next platform version
- Move
Latesttag to point to the new platform version
Thumbs-up if you think this is a good idea!
My $.02: I'd go with option 1 (introducing a new label called DEFAULT). Given that LATEST has always (well, until 1.4) meant the latest GA, this would be compatible with the expectations anyone should have reasonably had when they chose to deploy with LATEST. I don't think the "con" you listed is a con at all. If someone preferred a more conservative experience, they should not have chosen LATEST in the first place, IMO. Sell this as the new DEFAULT label being a new "managed conservatism" offering. Previously, someone seeking a conservative approach had to manually track the PV version and increment when the landscape met their definition of safe.
One suggested delta: if the label/tag is not specified, DEFAULT should be used (as, well, the default), not LATEST.
(I don't like calling the new label STABLE because it implies that the most recent GA release is not necessarily stable. To me, the GA status implies stable.)
wrt the bonus question: assuming you went with the DEFAULT+LATEST option, i'd only point LATEST and DEFAULT to GA releases. Perhaps you could even add EDGE as a new label that is defined as opting-in to preview. So, for instance, DEFAULT might currently be 1.3, LATEST would currently be 1.4, and EDGE could point to a 1.5 preview that is unveiled at reinvent. Then, once 1.5 becomes GA, LATEST and EDGE would both point to 1.5 and DEFAULT moves to 1.4 when it's battle-tested.
I found the "Option 1" better. It is, as you said: "meaningful and self-explanatory".
LATEST should always provide the latest GA PV release. Perhaps, your default label could be called STABLE, instead of DEFAULT? – At the point of n-1, the PV should be stable, I'd worried if not. 😄 On the other hand, I also understand if you end up using DEFAULT – it is much less biased than STABLE.
The EDGE label could and should come out with the very latest and greatest, cutting edge, non-production grade features. Using it should require user consent – opt-in, like any other preview.
Version 1.4 of the PV should have really been 2.0. It has a bunch of breaking changes due to the way how it handles network requests to registries and the Fargate control plane. A task running on 1.3 which was upgraded to 1.4 in a VPC without internet gateway and ECR private link will fail to launch if you upgraded to 1.4.
How about adopting semver for versioning the PV and let the customer decide which version they want to stick to (major, minor, patch). The user would specify 1.x as their PV and they would only receive non-breaking changes. This will, of course, require that the Fargate team actually uses semver as intended and recognizes breaking changes as such and increments the major version (2.0.0 instead of 1.4).
+1 to the stable + latest approach.
I didn't know 1.4 had breaking changes so +1 to semver also.