copilot-cli Separate addon stacks to avoid circular dependencies

Current behavior

At the moment Copilot combines all addons related to a workload into a single CloudFormation stack, the AddonsStack.

Unfortunately, this limits the usability of addons, because circular dependencies are very easy to produce. Take the following use case with these 2 addons:

I've got an S3 bucket, which I'd like to create as a workload addon. I'd like to pass the bucket's name to my Backend Service's ECS task. -> so the Copilot ECS Service depends on my addon (transitively). The dependency looks like this: the addon stack outputs the S3 bucket name, and the service references it via a !Ref added by a cfn patch
I've got a CloudWatch alarm, which I'd use to monitor some metric on my ECS service, so I need the ECS Service's name to create this addon -> so my addon depends on the ECS Service. The dependency looks like this: via addons.parameters.yml I pass the !Ref to the ECS Service down to the addon stack

Because my addons are collected to a single stack, being upstream and downstream to a Copilot resource is not possible:

Circular dependency between resources: [AddonsStack, Service, TaskDefinition]

Maybe there are some crazy workarounds, but I see the design choice as the main problem, that the addons are put into a single stack.

Desired behavior

Create one stack for every addon separately. This leaves more room for each addon, as it can be decided for each one of them, whether they depend on Copilot resources or vica versa.

Additional challenges

If this feature was to be implemented, it is going to be a challenge to migrate the current shared addon stacks to separated stacks. For productive applications it is presumably undesired to destroy the shared addon stack with all its resources and to create the new stacks, a smoother transition would be importing the old resources into the new stack and the deletion of the old stack without deleting the resources.

Related issues

#3149: very similar, but it went more into the direction of workarounds, with this issue I'd focus on the design decision to collect all addons into a single stack

Jan 20 '24 10:01 bencehornak

Thanks for the well thought-out write up @bencehornak, and for resurfacing this concern ❤️! This makes a lot of sense to me.

What do you think of the proposal of having multiple nested stack? For example, with ⬇️

service-name/
   manifest.yml
   addons/ 
      template.yml              # Existing addons stack.
      addons.parameters.yml
      alarm/                       # New addons stack with the CloudWatch alarm.
        template.yml
        addons.parameters.yml

You'd end up with ⬇️ CloudFormation stack:

Resources:
  AddonsStack: # Existing stack with the S3 bucket.
    Type: AWS::CloudFormation::Stack
  AddonsStackCloudWatchAlarm: # New stack that contains the cloudwatch alarm.
    Type: AWS::CloudFormation::Stack

The dependency will go like like: AddonsStackCloudWatchAlarm -[depends on]-> ECS service -[depends on]-> AddonsStack (with the S3 bucket).

With this proposal, the addons resources lifecycle are still controlled by the workloads. CloudFormation will still automatically resolve the order in which to deploy the resources. One downside is that if a user uses this mechanism just to avoid circular dependency, then they are sort of forced to separate resources into individual nested stacks, instead of following the logical grouping that you wanted. The "additional challenges" that you mentioned is still a challenge in this proposal. This in my mind is probably one of the trickiest part to work out.

This is the same proposal that @efekarakus had in the old issue. I am curious of your thought on this!

Thanks again ❤️ !

Jan 22 '24 07:01 Lou1415926

Hey @Lou1415926,

Thanks for your detailed thoughts on this! Indeed, @efekarakus's proposal would address this problem, with that one could really break the circle of dependencies.

I've got some questions to challenge the design of the proposed directory structure:

Isn't it confusing for the new Copilot users that they can put addons directly in the addons folder, as well as inside these directories (which will contain just 1 file in most cases, when there are no additional parameters)?
Is there a reason to support both (besides backward compatibility)?
Would you do the 'addon merging' between the files in the same subdirectory?

My initial idea was more radical: I would just remove the 'addon merging' feature from Copilot altogether, because quite frankly, I see it as a bug rather than a feature. So I would produce a different result for the same input, a dedicated stack for each addon:

service-name/
   manifest.yml
   addons/ 
      template1.yml              # Will translate into AddonsStackTemplate1
      template2.yml              # Will translate into AddonsStackTemplate2
      addons.parameters.yml

This has the advantage of preserving the clean directory structure and it eliminates the possibility of my main concern, the implicit circular dependencies. On the other side it raises two questions:

Does it make sense to share the addons.parameters.yml between the addons? Or would it be better to create one for each concerned addon (e.g. template1.parameters.yml)?
And more importantly: how to do the migration of old users? Should Copilot force (or at least recommend) them to move the addon resources into separate stacks? Alternatively, Copilot could ask, if the user wants to benefit from the addon separation, and if the user says no, it could append all addon yamls into a long combined yaml, performing the 'addon merging' on a different but more intuitive level.

Is there any advantage of the 'addon-merging', which I overlook? Do you know, why it was introduced in the first place?

Let me know, what you think!

Jan 22 '24 08:01 bencehornak

I'll discuss your proposal first!

Does it make sense to share the addons.parameters.yml between the addons? Or would it be better to create one for each concerned addon (e.g. template1.parameters.yml)?

I am leaning towards one-for-each addons, simply because it's quite likely that some addons need different parameters (some need !Ref Service, the others might want !Ref TaskDefinition) than the other addons.

how to do the migration of old users?

I unfortunately don't have a clear answer right now! I think this definitely needs a lot more thoughts. Here's my current thinking:

In your proposal, there is one question that we need to ask users: whether to separate the stacks or not. We can't infer the answer from the file structure like with the other proposal - so we sort of have to explicitly ask:

By file name? For example, a user can have a.template1.yml a.template2.yml, and b.template3.yml, which will result in two nested stacks: a and b.
By prompting a question + adding a flag? Prompted questions need to be shipped with respective flags to work with CI/CD. I imagine the flag will be named something like --addons-in-separate-stacks which defaults to false (quite necessarily, for backward compatibility - "moving" resources from a merged stack to a separated stack actually means deleting + creating the resource).
Other ideas that I haven't thought of.

The second option isn't super clean, because the flag is really only necessary for the sake of backward compatibility for existing users - I'd imagine a new user won't mind at all if the addons are deployed in separate stacks.

The first option looks better - but I don't think we currently error out for any existing addons named x.y.yml 😂 So more thoughts need to be put into how the format should be like + how to process it.

Jan 22 '24 11:01 Lou1415926

Yes, a lot of open ends, not a trivial task at all.

Let me try to summarize the requirements and the mentioned ideas (let me know, if you would add something).

Functional requirements

Users of the current Copilot version should be able to keep their resources during a Copilot version upgrade (no deletion plus recreation is allowed)
Users of the current Copilot version should be able to opt-in for the separate addons stack feature
1. There should be a hard switch possibility, where the old stacks (and with them the addon resources) are deleted first and the separate stacks are created
2. Optionally a soft switch could be implemented (deleting the stack but keeping the resources (DeletionPolicy: retain) and importing them to the new addon stack
For users of future Copilot versions the separate addon stacks feature should be active by default

Possible implementations

Implementation 1: support both merged and separated addon stacks, decide based on the filename

I would put both ideas here:

Implementation 1a: sub-directories for separate stacks

service-name/
   manifest.yml
   addons/ 
      template.yml              # Existing addons stack.
      addons.parameters.yml
      alarm/                       # New addons stack with the CloudWatch alarm.
        template.yml
        addons.parameters.yml

Implementation 1b: special filenames

A user can have a.template1.yml, a.template2.yml, and b.template3.yml, which will result in two nested stacks: a and b.

(quoted from @Lou1415926)

My opinion on 1a and 1b

It fulfills the backward compatibility requirements. However, I find it confusing, as future users will be wondering, why two filename conventions are in place, and chances are that they will choose a naming convention without knowing the context.

Implementation 2: activate the addons in separate stacks feature with a CLI flag

Prompted questions need to be shipped with respective flags to work with CI/CD. I imagine the flag will be named something like --addons-in-separate-stacks which defaults to false (quite necessarily, for backward compatibility - "moving" resources from a merged stack to a separated stack actually means deleting + creating the resource).

(quoted from @Lou1415926)

My opinion on implementation 2

This parameter impacts the generated CloudFormation code just like any other parameter in any of the other manifests. I think this switch belongs to the version controlled manifest files, just like any other parameters having an influence on the CloudFormation output.

Implementation 3: create a feature flags section in some global manifest file

If there was a manifest for the app, users could enable and disable Copilot features there:

# copilot/application.yml (or even the existing copilot/.workspace file,
# whose name doesn't really follow the Copilot naming convention)

application: parcelos
feature_flags:
  addons_in_separate_stacks: yes

Alternatively, it could be controlled on a workload level, if you want to give more configuration options to the users.

My opinion on implementation 3

It gives full control to the users when they want to opt in for this feature. When a new project gets initialized, Copilot can prefill the feature_flags with the updated defaults, so future users will only be able to see the most recent behavior. Users of older versions can decide to opt in, whenever they are ready.

Implementation 4: remove addon merging functionality

Every file will be translated to a separate CloudFormation nested stack:

service-name/
   manifest.yml
   addons/ 
      template1.yml              # Will translate into AddonsStackTemplate1
      template2.yml              # Will translate into AddonsStackTemplate2
      addons.parameters.yml

In this case a backward-compatible migration from earlier Copilot versions would be the concatenation of previously separate files into a single large addons file.

My opinion on implementation 4

Probably the most drastic solution: some users might be annoyed when they realize that their addon files will be merged into a single one. However, this is the option which displays most transparently what was happening inside Copilot, and would be the most future-proof among all the other options. It explains the users, how their addons yamls will translate into stacks, no easter eggs, no implicit dependencies. Users can also opt in at any time, when they decide to move resources to a separate stack (however, they will have to destroy + create them, unless they want to do a tricky stack deletion + import by hand).

Conclusion

I think it's a change which Copilot will benefit from in the long run, because it results in a more transparent behavior. In the short term the user's attention might be necessary to figure out what to do with the existing resources.

I personally would tend to choose implementation 4, which will disturb existing users by concatenating their addons once, however, after that the API will remain clean and future-proof.

Jan 22 '24 13:01 bencehornak

If you give me feedback on the Copilot team's preferences, I can try to implement it

Jan 27 '24 09:01 bencehornak

@bencehornak yay thanks for offering your help ❤️ ! We'd be happy to accept your contribution!

I recommend 1a, which a) supports multiple nested stacks and b) keeps the template merging for now). Below is my reasoning, and please let me know what you think!

The major concern for option 4 is that it breaks for users with multiple addons templates that are already merged into one stack - like you said, by deleting their resources that were managed by the old, merged addons stack. Option 1b/2/3, if shipped with option 4, can solve the issue, but each have its own problems:

We both agree that option 1b and 2 have confusing user experience.
Option 3 is a major change to Copilot that by itself warrants another discussion. For example, what to do if an application's workspaces have conflicting config files? If we are to store it as a global manifest, what other potential usage would we have of it? What's the UX for applications with multiple developers (hence multiple machines) working on it? How to configure per-application (instead of per-workspace or globally)? Basically, a global manifest would be a new pattern in Copilot, so a lot more learnings/explorations/design need to happen before we can be confident about it. To clarify, I definitely like the direction that option 3 takes! I just think it is its a huge topic that we can't rush into.

Option 4 is clean, but we are missing a nice solution to maintain backward compatibility; it's not on our 2024 roadmap yet, so the core team can't guarantee a dedicated block of time to dive into the problem just yet. Therefore, now to me the question is, should we support 1a for now and then investigate deeper into 4 in the future when we have the bandwidth, or should we just come up with a backward-compatible solution now and ship 4?

My take is that we should go with the former (1a now + 4 later). This is because:

Even if we do 4 now, we will have to support both file structure to maintain backward compatibility, at least for a significantly long periods of time if not forever. I don't want to force users to migrate from the old structure to new structure ASAP, because a) deleting + importing will be inevitable => likely a painful process b) it sounds dangerous to users no matter how well we design/document the migration path and c) for users that are already comfortably settled in with the current file structure, the benefit is very marginal but the risk to migrate is/looks high.
If we do 4 later, I think by a) generating storage init addons in the new structure b) making it clear that the new structure is recommended in the doc, we can effectively reduce confusion for new users.
Like I've mentioned, the team currently does not have bandwidth to dive in and remove the blocker for option 4. Meanwhile, 1a would solve the immediate issue (circular dependency).
The extra cost to support 4 later shouldn't be much worse than supporting it now. The additional complexity (on top of whatever complexity that the migration already has) would be on providing a migration path for users with multiple addons stack, and do end up wanting to migrate from old file structure to the new one, which isn't too bad.

Hope this clarifies things and let me know your thoughts!

Feb 05 '24 23:02 Lou1415926

@Lou1415926 thank you so much for your thoughts!

I totally get your reasoning and see things very similarly. Option 4 would have been a great design on day 1 for Copilot, from the now it would probably cause more harm than good.

And I agree, 1a is by far the simplest solution, which doesn't depend on other concepts (e.g. feature-flags), which have a huge complexity on their own.

I can try to implement 1a on my own, just one more question to understand the specification: is the addon merging desirable within the addon subfolders? (i.e. should another_stack1.yml and another_stack2.yml merged the same way into a single nested stack?)

service-name/
   manifest.yml
   addons/                 # Files in this directory are backward compatible and will be merged into the AddonsStack resource
      existing1.yml
      existing2.yml
      addons.parameters.yml
      
      another_stack/       # A separate addons stack, named AddonsStackAnotherStack
        another_stack1.yml
        another_stack2.yml
        addons.parameters.yml

Feb 06 '24 17:02 bencehornak

i.e. should another_stack1.yml and another_stack2.yml merged the same way into a single nested stack?

ohh yeah thanks for asking! I think they should - otherwise it'd be confusing because of the inconsistency. In addition, let's not worry about environment addons for now , as the sole implementation for workload addons is already quite a lot!

In general, Copilot does the following:

Read the addons folder, merge and parse the files. https://github.com/aws/copilot-cli/tree/mainline/internal/pkg/addon
- ParseFromWorkload reads the files in the <workload>/addons folder, merges the files, and returns a single addons stack for the workload.
- As an example of parsing - see how Copilot automatically applies any AWS:EC2:SecurityGroup in the Outputs section to the ECS service. This is done by parsing the Outputs section, referencing it back to the Resource section for the Type information, and then pass the information to the package's caller. (see https://github.com/aws/copilot-cli/blob/mainline/internal/pkg/addon/output.go).
- The merge logic lives here: https://github.com/aws/copilot-cli/blob/mainline/internal/pkg/addon/cloudformation.go#L48.
- I mentioned how Copilot applies certain Output automatically to the ECS service. This is an example of how the Output information is passed over to the workload stack data structure, so that the workload stack knows to add the corresponding snippets in the CloudFormation template (secrets, security groups, iam policy, and env vars).
Upload the resulted template to the S3 artifact bucket, get the URL. https://github.com/aws/copilot-cli/blob/d3b36ac9bb3d2c189c829327d20b3b10b04519bf/internal/pkg/cli/deploy/workload.go#L738
Specify the URL for the nested stack in the CloudFormation template.
- https://github.com/aws/copilot-cli/tree/mainline/internal/pkg/template/templates: where the go-templates used to render the CloudFormation template live. You don't need worry about the templates/addons folder as they are only relevant for storage init. This is where the addons stack partial is.

Thank you very much again for your willingness to help ❤️ !!! Please do let me know if you have any questions!

Feb 07 '24 04:02 Lou1415926