copilot-cli env deploy after CLI update

We're in the process of moving to using the new environments manifest. We moved from CLI version 1.17.0 to 1.21.0. I generated the manifest for our dev environment per instructions from the blog post and then ran copilot env deploy --name dev.

However, there were a number of failures during the update that we're not sure how to move past:

- Creating the infrastructure for the insuredportal-dev environment.                 [update rollback failed]  [102.3s]
  Export insuredportal-dev-ServiceDiscoveryNamespaceID cannot be updated
   as it is in use by insuredportal-dev-graphql, insuredportal-dev-ivr-s
  erver and insuredportal-dev-spa (and 1 more)
  The following resource(s) failed to update: [Cluster, DNSDelegationFun
  ction, CertificateValidationFunction, EnvironmentManagerRole].

  - An ECS cluster to group your services                                            [update failed]           [2.6s]
    Resource handler returned message: "Error occurred during operation 'S
    ettings can only be modified, not removed. Required Settings: [contain
    erInsights]'." (RequestToken: d3ec517c-99fd-a696-be4f-1d1a32c57532, Ha
    ndlerErrorCode: GeneralServiceException)

  - An IAM Role to describe resources in your environment                            [update failed]           [20.9s]
    Resource update cancelled

Oct 07 '22 12:10 benjaminpottier

Hi @benjaminpottier!

Oh no 🙇 that is super strange, each environment should be getting its own service discovery namespace 🤔

Can you tell me a little bit more about the setup:

Is thedev environment importing a VPC?
Would you mind sharing the environment manifest file?
When you run copilot env package do you see a difference in the template generated compared to what's stored in CloudFormation? (I use a tool like https://www.yamldiff.com/ to highlight the differences between the two templates)
Were there any modifications done to the resources such as the service discovery namespace outside of Copilot? through the AWS CLI or Console for example

Oct 07 '22 16:10 efekarakus

Hi @benjaminpottier!

Oh no 🙇 that is super strange, each environment should be getting its own service discovery namespace 🤔

Can you tell me a little bit more about the setup:

Is thedev environment importing a VPC?

Would you mind sharing the environment manifest file?

When you run copilot env package do you see a difference in the template generated compared to what's stored in CloudFormation? (I use a tool like https://www.yamldiff.com/ to highlight the differences between the two templates)

Were there any modifications done to the resources such as the service discovery namespace outside of Copilot? through the AWS CLI or Console for example

All our environments used the VPC created by copilot.
All that is in the manifest file is "name:dev" and "type: Envrionment". I generated it from env show command.
YAML diff is complaining about the format from the env package output. See:

Error in left input: unknown tag !<!Ref> at line 46, column 38:
    ... Not [!Equals [ !Ref ALBWorkloads, "" ]]

We have made modifications outside copilot, but not to the service discovery. It might be worth noting that our service discovery namespaces never had the environment included in them before, except for our prod environment. We always thought this was strange. What I mean is, in our dev, test, and model environments we have <app>.local and in prod we have <env>.<app>.local.

Oct 07 '22 16:10 benjaminpottier

Ohh!! got it!!

Can you try the release with v1.21.1 instead?

We had discovered a bug in our translation of the manifest, here is the snippet from the release notes:

Preserve existing service discovery endpoint (https://github.com/aws/copilot-cli/pull/3949)

In the transition from env upgrade to env deploy, we lost the preservation of the ServiceDiscoveryEndpoint parameter and instead assumed the [app].[env].local format. However, environments that predated our v1.9.0 release have [app].local-formatted ServiceDiscoveryEndpoint parameters, and therefore were erroring out when updates were attempted. This fix preserves the existing value when env deploy is run.

Oct 07 '22 16:10 efekarakus

Ohh!! got it!!

Can you try the release with v1.21.1 instead?

We had discovered a bug in our translation of the manifest, here is the snippet from the release notes:

Preserve existing service discovery endpoint (#3949)

In the transition from env upgrade to env deploy, we lost the preservation of the ServiceDiscoveryEndpoint parameter and instead assumed the [app].[env].local format. However, environments that predated our v1.9.0 release have [app].local-formatted ServiceDiscoveryEndpoint parameters, and therefore were erroring out when updates were attempted. This fix preserves the existing value when env deploy is run.

I would but am unable because the stack in a "UPDATE_ROLLBACK_FAILED state and can not be updated."

I have the option to "Continue update rollback" but am hesitant because of the following warning:

After the rollback is complete, the state of the skipped resources will be inconsistent with the state of the resources in the stack template. Before performing another stack update, you must update the stack or resources to be consistent with each other. If you don't, subsequent stack updates might fail, and the stack will become unrecoverable.

What would you suggest?

Oct 07 '22 17:10 benjaminpottier

Gotcha, this scenario sounds very similar to this thread: https://gitter.im/aws/copilot-cli?at=6306516cb16e8236e3287045

In this situation, a client continued the rollback skipping the Cluster and EnvironmentManagerRole which got them to UPDATE_ROLLBACK_COMPLETE afterwards it looks like with v1.21.1 the command could succeed again.

Oct 07 '22 17:10 efekarakus

That kind of work. But now I'm getting:

Statement IDs (SID) in a single policy must be unique. (Service: AmazonIdentityManagement; Status Code: 400; Error Code: MalformedPolicyDocument; Request ID: 4c4def8e-cca8-4042-884a-850a1a1bcd7a; Proxy: null)

The following resource(s) failed to update: [EnvironmentManagerRole].

Also, when I run env package now I get the following:

Only found one environment, defaulting to: dev
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x12ddede]

goroutine 1 [running]:
github.com/aws/copilot-cli/internal/pkg/describe.NewEnvDescriber({{0xc0004c8310, 0xd}, {0xc000694748, 0x3}, 0x0, {0x0, 0x0}, {0x0, 0x0}})
        /codebuild/output/src864945534/src/internal/pkg/describe/env.go:78 +0x7e
github.com/aws/copilot-cli/internal/pkg/cli/deploy.NewEnvDeployer(0xc0008d4c90)
        /codebuild/output/src864945534/src/internal/pkg/cli/deploy/env.go:112 +0x2cd
github.com/aws/copilot-cli/internal/pkg/cli.newPackageEnvOpts.func2()
        /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:112 +0xbb
github.com/aws/copilot-cli/internal/pkg/cli.(*packageEnvOpts).Execute(0xc000119ad0)
        /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:153 +0x24d
github.com/aws/copilot-cli/internal/pkg/cli.run({0x1f0cd70, 0xc000119ad0})
        /codebuild/output/src864945534/src/internal/pkg/cli/cli.go:98 +0x59
github.com/aws/copilot-cli/internal/pkg/cli.buildEnvPkgCmd.func1(0x0?, {0x0?, 0x0?, 0x0?})
        /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:282 +0x65
github.com/aws/copilot-cli/internal/pkg/cli.runCmdE.func1(0xc000426000?, {0x2caa6f0?, 0x0?, 0x0?})
        /codebuild/output/src864945534/src/internal/pkg/cli/cli.go:72 +0x7b
github.com/spf13/cobra.(*Command).execute(0xc000426000, {0x2caa6f0, 0x0, 0x0})
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003d1400)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:918
main.main()
        /codebuild/output/src864945534/src/cmd/copilot/main.go:34 +0x25

This is with running on macOS M1 and Linux x64, version 1.22.0.

Oct 07 '22 17:10 benjaminpottier

That kind of work. But now I'm getting:

Statement IDs (SID) in a single policy must be unique. (Service: AmazonIdentityManagement; Status Code: 400; Error Code: MalformedPolicyDocument; Request ID: 4c4def8e-cca8-4042-884a-850a1a1bcd7a; Proxy: null)

The following resource(s) failed to update: [EnvironmentManagerRole].

Also, when I run env package now I get the following:

Only found one environment, defaulting to: dev
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x12ddede]

goroutine 1 [running]:
github.com/aws/copilot-cli/internal/pkg/describe.NewEnvDescriber({{0xc0004c8310, 0xd}, {0xc000694748, 0x3}, 0x0, {0x0, 0x0}, {0x0, 0x0}})
        /codebuild/output/src864945534/src/internal/pkg/describe/env.go:78 +0x7e
github.com/aws/copilot-cli/internal/pkg/cli/deploy.NewEnvDeployer(0xc0008d4c90)
        /codebuild/output/src864945534/src/internal/pkg/cli/deploy/env.go:112 +0x2cd
github.com/aws/copilot-cli/internal/pkg/cli.newPackageEnvOpts.func2()
        /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:112 +0xbb
github.com/aws/copilot-cli/internal/pkg/cli.(*packageEnvOpts).Execute(0xc000119ad0)
        /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:153 +0x24d
github.com/aws/copilot-cli/internal/pkg/cli.run({0x1f0cd70, 0xc000119ad0})
        /codebuild/output/src864945534/src/internal/pkg/cli/cli.go:98 +0x59
github.com/aws/copilot-cli/internal/pkg/cli.buildEnvPkgCmd.func1(0x0?, {0x0?, 0x0?, 0x0?})
        /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:282 +0x65
github.com/aws/copilot-cli/internal/pkg/cli.runCmdE.func1(0xc000426000?, {0x2caa6f0?, 0x0?, 0x0?})
        /codebuild/output/src864945534/src/internal/pkg/cli/cli.go:72 +0x7b
github.com/spf13/cobra.(*Command).execute(0xc000426000, {0x2caa6f0, 0x0, 0x0})
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003d1400)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:918
main.main()
        /codebuild/output/src864945534/src/cmd/copilot/main.go:34 +0x25

This is with running on macOS M1 and Linux x64, version 1.22.0.

I was able to get the package command to start working by downgrading to 1.21.1. I can see from cloudtrail that its trying to update the root policy for dev EnvManagerRole with a policy that does have two of the same Sids, "PatchPutObjectsToArtifactBucket".

Oct 07 '22 19:10 benjaminpottier

Sorry, but I'm really hoping to work through this issue today so I can have a solution to move forward with the rest of my environments.

I fixed the EnvManagerRole issue, but now when I try to deploy the environment I get:

Export insuredportal-dev-SubDomain cannot be deleted as it is in use by insuredportal-dev-graphql, insuredportal-dev-ivr-server and insuredportal-dev-spa (and 1 more)

Also, I'm really worried this is in a state where I have to delete and re-create my dev environment which would be extremely disruptive. It also doesn't give me a lot of confidence in moving forward with the rest of my environments. Any help is appreciated.

Oct 07 '22 19:10 benjaminpottier

I fixed the EnvManagerRole issue

Awesome!

Export insuredportal-dev-SubDomain cannot be deleted

Hmm, is there a domain that's associated with the application? was the application created with copilot app init --domain? That error means that for some reason copilot couldn't find a domain name associated with the application but the stack seems to think there should be one.

To troubleshoot, in the application account if you go to AWS Systems Manager > Parameter Store do you see your application have a domain name value or is it an empty string? Screen Shot 2022-10-07 at 1 12 44 PM

Oct 07 '22 20:10 efekarakus

I fixed the EnvManagerRole issue

Awesome!

Export insuredportal-dev-SubDomain cannot be deleted

Hmm, is there a domain that's associated with the application? was the application created with copilot app init --domain? That error means that for some reason copilot couldn't find a domain name associated with the application but the stack seems to think there should be one.

To troubleshoot, in the application account if you go to AWS Systems Manager > Parameter Store do you see your application have a domain name value or is it an empty string?

I do see the domain value. Could it be when I "Skipped" those resources to fix the failed rollback that might have broken something? I'm also noticing things missing like S3Bucket for copilot specific functions when I run a env package -n dev. The stack seems really out of whack.

Oct 07 '22 20:10 benjaminpottier

💭 I'm trying to think about what could be the issue here.

I'm confused because if there is a domain value, then the insuredportal-dev-SubDomain shouldn't be getting deleted. When you run copilot env package what are the lines that get deleted? Do you see an EnvironmentSubdomain output written there.

Oct 07 '22 20:10 efekarakus

S3Key and S3Bucket are empty for CertificateValidationFunction, CustomDomainFunction, and DNSDelegationFunction.

This is what I'm seeing for EnvironmentSubdomain:

  EnvironmentSubdomain:
    Condition: DelegateDNS
    Value: !Sub ${EnvironmentName}.${AppName}.${AppDNSName}
    Description: The domain name of this environment.
    Export:
      Name: !Sub ${AWS::StackName}-SubDomain

Oct 07 '22 20:10 benjaminpottier

ah the empty values should be okay, if you use copilot env package --upload-assets those values will be filled.

Would you mind running copilot env deploy again and giving a screenshot of the Events tab in CloudFormation for the resources?

Oct 07 '22 20:10 efekarakus

ah the empty values should be okay, if you use copilot env package --upload-assets those values will be filled.

Would you mind running copilot env deploy again and giving a screenshot of the Events tab in CloudFormation for the resources?

Screen Shot 2022-10-07 at 4 44 12 PM

I'm hitting the policy issue again. I have to update the stack directly and remove the PatchPutObjectsToArtifactBucket statement to get passed the error. Is there a better way?

Oct 07 '22 20:10 benjaminpottier

Gotcha, in the CloudFormation template right now what do you see for the environment's version?

Description: CloudFormation environment template for infrastructure shared among Copilot workloads.
Metadata:
  Version: v1.12.2 # What is this value for you?

I think for now if you update the IAM role directly in the console by giving it a different sidfor PatchPutObjectsToArtifactBucket that should unblock the stack (like PatchPutObjectsToArtifactBucket-backup)

Oct 07 '22 20:10 efekarakus

Yes, I'm seeing v1.12.2 ... I updated the sid in the console but I'm still hitting the same error. The issue seems to be that its trying to create a policyDocument with two sids that are the same (PatchPutObjectsToArtifactBucket).

Oct 07 '22 20:10 benjaminpottier

🤯 and this error happens with copilot version number v1.21.1? That permission only gets inserted if the template version is less than v1.9.0: https://github.com/aws/copilot-cli/blob/a830133d6615247bff5e3da4562e19936e2d29f8/internal/pkg/cli/deploy/patch/env.go#L133

So I don't get why it's trying to insert it twice, it should have detected that the version of the template already has the policy and doesn't need to update it again.

Would you mind sharing a screenshot of the top of your environment template? Screen Shot 2022-10-07 at 2 21 19 PM

Oct 07 '22 21:10 efekarakus

Okay, so maybe this is the issue then?

Screen Shot 2022-10-07 at 5 23 36 PM

Thats a screenshot of the currently deployed stack.

Oct 07 '22 21:10 benjaminpottier

ah yeah!

Like you suspect, I think while skipping the rollback the permission got retained and the version remained v1.8.0.

What I am confused by is if we were to delete the PatchPutObjectsToArtifactBucket inlined policy from the IAM console, I feel like copilot env deploy then should have worked? Because the version is v1.8.0 and it will try to add the new sid, and then move forward. Are we certain that deleting the policy in the console and running the command results in the duplicate sid issue?

Oct 07 '22 21:10 efekarakus

ah yeah!

Like you suspect, I think while skipping the rollback the permission got retained and the version remained v1.8.0.

What I am confused by is if we were to delete the PatchPutObjectsToArtifactBucket inlined policy from the IAM console, I feel like copilot env deploy then should have worked? Because the version is v1.8.0 and it will try to add the new sid, and then move forward. Are we certain that deleting the policy in the console and running the command results in the duplicate sid issue?

I just deleted the PatchPutObjectsToArtifactBucket policy and tried again. Same error ☹️.

Oct 07 '22 21:10 benjaminpottier

😭 I am so sorry about this churn. OK, then we'll trick copilot and upgrade the version of the template manually to v1.12.2.

Would you mind editing the template directly in CloudFormation and overriding the value from v1.8.0 to v1.12.2 and updating the stack?

CloudFormation won't allow updates only to the Metadata field, therefore to force update the stack, Copilot usually adds an output like this to the template:

Outputs:
  LastForceDeployID:
    Value: "44c1f39f-5505-4ca8-98c6-3755050626bb" # some random value that will force CloudFormation to update.
    Description: Optionally force the template to update when no immediate resource change is present.

Oct 07 '22 21:10 efekarakus

It's okay. I appreciate your helping (and I especially appreciate this app you all have built).

So, changing just the metadata won't work because cloudformation complains that there are no changes when I try to update the stack with a template (with only the metdata change). Should I also add a LastForceDeployID to get past this?

Oct 07 '22 22:10 benjaminpottier

Should I also add a LastForceDeployID to get past this?

Yup, exactly. The LastForceDeployID like shown above should get passed it

Oct 07 '22 22:10 efekarakus

Metadata is fixed.

Screen Shot 2022-10-07 at 6 14 50 PM

Should I attempt to deploy?

Oct 07 '22 22:10 benjaminpottier

Yup, go for it!

Oct 07 '22 22:10 efekarakus

Yup, go for it!

We're back! Thank you so much. But, what happened? Maybe I did something wrong the first time? I'm done for the week now but I am planning on moving forward with the rest of my environments next week.

Thank you again!

Oct 07 '22 22:10 benjaminpottier

Awesome!!!

Ok for the other environments, I think as long as you use copilot version v1.21.1 instead of v1.21.0 the upgrades should be smooth 🤞 !

But, what happened? Maybe I did something wrong the first time?

When using env deploy with v1.21.0, we had missed a case while transforming the service discovery settings and had to put a patch release for v1.21.1.

I'm still not clear why the UPDATE_ROLLBACK_FAILED as I don't have access to the events from CloudFormation. But during the deployment with v1.21, the EnvManagerRole successfully upgraded with the new permissions, however when the rollback failed we ended up skipping the resource to continue the rollback.
I think we shouldn't have skipped it because the PatchPutObjectsToArtifactBucket policy remained in the IAM role but the Metadata.Version of the template rolled back to v1.8. So the template was in this awkward state where the IAM role had the v1.12.2 policies but the other resources were on the v1.8 version. Once we tricked copilot into stopping trying to update to the IAM role, then copilot env deploy got unblocked and successfully deployed 😌

Oct 07 '22 22:10 efekarakus

I'll close the issue now, but if you run into any other problems during the upgrades feel free to open another issue!

Have a good weekend 🎉

Oct 07 '22 22:10 efekarakus

I just encountered the same issue trying to run copilot env deploy using copilot v1.23.0. I didn't need to rollback my stack manually, but instead the stack was getting stuck in the state "UPDATE_ROLLBACK_COMPLETE" with a red "X" next to it and copilot would not finish the deploy.

copilot env deploy
Only found one environment, defaulting to: development
✘ Unable to update the environment's manager role with upload artifacts permission
✘ upload artifacts for environment development: ensure env manager role has permissions to upload: update environment template with PutObject permissions: wait until stack ehbackend-development update is complete: ResourceNotReady: failed waiting for successful resource state

Screen Shot 2022-11-03 at 4 09 47 PM

Following the instructions in https://github.com/aws/copilot-cli/issues/4069#issuecomment-1272114955 fixed it for me and I was able to deploy through copilot afterwards.

Nov 03 '22 20:11 mlazar-endear

copilot-cli copilot-cli copied to clipboard

env deploy after CLI update

copilot-cli
copilot-cli copied to clipboard