copilot-cli
copilot-cli copied to clipboard
env deploy after CLI update
We're in the process of moving to using the new environments manifest. We moved from CLI version 1.17.0 to 1.21.0. I generated the manifest for our dev environment per instructions from the blog post and then ran copilot env deploy --name dev
.
However, there were a number of failures during the update that we're not sure how to move past:
- Creating the infrastructure for the insuredportal-dev environment. [update rollback failed] [102.3s]
Export insuredportal-dev-ServiceDiscoveryNamespaceID cannot be updated
as it is in use by insuredportal-dev-graphql, insuredportal-dev-ivr-s
erver and insuredportal-dev-spa (and 1 more)
The following resource(s) failed to update: [Cluster, DNSDelegationFun
ction, CertificateValidationFunction, EnvironmentManagerRole].
- An ECS cluster to group your services [update failed] [2.6s]
Resource handler returned message: "Error occurred during operation 'S
ettings can only be modified, not removed. Required Settings: [contain
erInsights]'." (RequestToken: d3ec517c-99fd-a696-be4f-1d1a32c57532, Ha
ndlerErrorCode: GeneralServiceException)
- An IAM Role to describe resources in your environment [update failed] [20.9s]
Resource update cancelled
Hi @benjaminpottier!
Oh no đ that is super strange, each environment should be getting its own service discovery namespace đ¤
Can you tell me a little bit more about the setup:
- Is the
dev
environment importing a VPC? - Would you mind sharing the environment manifest file?
- When you run
copilot env package
do you see a difference in the template generated compared to what's stored in CloudFormation? (I use a tool like https://www.yamldiff.com/ to highlight the differences between the two templates) - Were there any modifications done to the resources such as the service discovery namespace outside of Copilot? through the AWS CLI or Console for example
Hi @benjaminpottier!
Oh no đ that is super strange, each environment should be getting its own service discovery namespace đ¤
Can you tell me a little bit more about the setup:
- Is the
dev
environment importing a VPC?- Would you mind sharing the environment manifest file?
- When you run
copilot env package
do you see a difference in the template generated compared to what's stored in CloudFormation? (I use a tool like https://www.yamldiff.com/ to highlight the differences between the two templates)- Were there any modifications done to the resources such as the service discovery namespace outside of Copilot? through the AWS CLI or Console for example
- All our environments used the VPC created by copilot.
- All that is in the manifest file is "name:dev" and "type: Envrionment". I generated it from env show command.
- YAML diff is complaining about the format from the env package output. See:
Error in left input: unknown tag !<!Ref> at line 46, column 38:
... Not [!Equals [ !Ref ALBWorkloads, "" ]]
- We have made modifications outside copilot, but not to the service discovery. It might be worth noting that our service discovery namespaces never had the environment included in them before, except for our prod environment. We always thought this was strange. What I mean is, in our dev, test, and model environments we have <app>.local and in prod we have <env>.<app>.local.
Ohh!! got it!!
Can you try the release with v1.21.1 instead?
We had discovered a bug in our translation of the manifest, here is the snippet from the release notes:
Preserve existing service discovery endpoint (https://github.com/aws/copilot-cli/pull/3949)
In the transition from env upgrade to env deploy, we lost the preservation of the ServiceDiscoveryEndpoint parameter and instead assumed the [app].[env].local format. However, environments that predated our v1.9.0 release have [app].local-formatted ServiceDiscoveryEndpoint parameters, and therefore were erroring out when updates were attempted. This fix preserves the existing value when env deploy is run.
Ohh!! got it!!
Can you try the release with v1.21.1 instead?
We had discovered a bug in our translation of the manifest, here is the snippet from the release notes:
Preserve existing service discovery endpoint (#3949)
In the transition from env upgrade to env deploy, we lost the preservation of the ServiceDiscoveryEndpoint parameter and instead assumed the [app].[env].local format. However, environments that predated our v1.9.0 release have [app].local-formatted ServiceDiscoveryEndpoint parameters, and therefore were erroring out when updates were attempted. This fix preserves the existing value when env deploy is run.
I would but am unable because the stack in a "UPDATE_ROLLBACK_FAILED state and can not be updated."
I have the option to "Continue update rollback" but am hesitant because of the following warning:
After the rollback is complete, the state of the skipped resources will be inconsistent with the state of the resources in the stack template. Before performing another stack update, you must update the stack or resources to be consistent with each other. If you don't, subsequent stack updates might fail, and the stack will become unrecoverable.
What would you suggest?
Gotcha, this scenario sounds very similar to this thread: https://gitter.im/aws/copilot-cli?at=6306516cb16e8236e3287045
In this situation, a client continued the rollback skipping the Cluster
and EnvironmentManagerRole
which got them to UPDATE_ROLLBACK_COMPLETE
afterwards it looks like with v1.21.1 the command could succeed again.
That kind of work. But now I'm getting:
Statement IDs (SID) in a single policy must be unique. (Service: AmazonIdentityManagement; Status Code: 400; Error Code: MalformedPolicyDocument; Request ID: 4c4def8e-cca8-4042-884a-850a1a1bcd7a; Proxy: null)
The following resource(s) failed to update: [EnvironmentManagerRole].
Also, when I run env package
now I get the following:
Only found one environment, defaulting to: dev
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x12ddede]
goroutine 1 [running]:
github.com/aws/copilot-cli/internal/pkg/describe.NewEnvDescriber({{0xc0004c8310, 0xd}, {0xc000694748, 0x3}, 0x0, {0x0, 0x0}, {0x0, 0x0}})
/codebuild/output/src864945534/src/internal/pkg/describe/env.go:78 +0x7e
github.com/aws/copilot-cli/internal/pkg/cli/deploy.NewEnvDeployer(0xc0008d4c90)
/codebuild/output/src864945534/src/internal/pkg/cli/deploy/env.go:112 +0x2cd
github.com/aws/copilot-cli/internal/pkg/cli.newPackageEnvOpts.func2()
/codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:112 +0xbb
github.com/aws/copilot-cli/internal/pkg/cli.(*packageEnvOpts).Execute(0xc000119ad0)
/codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:153 +0x24d
github.com/aws/copilot-cli/internal/pkg/cli.run({0x1f0cd70, 0xc000119ad0})
/codebuild/output/src864945534/src/internal/pkg/cli/cli.go:98 +0x59
github.com/aws/copilot-cli/internal/pkg/cli.buildEnvPkgCmd.func1(0x0?, {0x0?, 0x0?, 0x0?})
/codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:282 +0x65
github.com/aws/copilot-cli/internal/pkg/cli.runCmdE.func1(0xc000426000?, {0x2caa6f0?, 0x0?, 0x0?})
/codebuild/output/src864945534/src/internal/pkg/cli/cli.go:72 +0x7b
github.com/spf13/cobra.(*Command).execute(0xc000426000, {0x2caa6f0, 0x0, 0x0})
/go/pkg/mod/github.com/spf13/[email protected]/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003d1400)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:918
main.main()
/codebuild/output/src864945534/src/cmd/copilot/main.go:34 +0x25
This is with running on macOS M1 and Linux x64, version 1.22.0.
That kind of work. But now I'm getting:
Statement IDs (SID) in a single policy must be unique. (Service: AmazonIdentityManagement; Status Code: 400; Error Code: MalformedPolicyDocument; Request ID: 4c4def8e-cca8-4042-884a-850a1a1bcd7a; Proxy: null)
The following resource(s) failed to update: [EnvironmentManagerRole].
Also, when I run
env package
now I get the following:Only found one environment, defaulting to: dev panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x12ddede] goroutine 1 [running]: github.com/aws/copilot-cli/internal/pkg/describe.NewEnvDescriber({{0xc0004c8310, 0xd}, {0xc000694748, 0x3}, 0x0, {0x0, 0x0}, {0x0, 0x0}}) /codebuild/output/src864945534/src/internal/pkg/describe/env.go:78 +0x7e github.com/aws/copilot-cli/internal/pkg/cli/deploy.NewEnvDeployer(0xc0008d4c90) /codebuild/output/src864945534/src/internal/pkg/cli/deploy/env.go:112 +0x2cd github.com/aws/copilot-cli/internal/pkg/cli.newPackageEnvOpts.func2() /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:112 +0xbb github.com/aws/copilot-cli/internal/pkg/cli.(*packageEnvOpts).Execute(0xc000119ad0) /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:153 +0x24d github.com/aws/copilot-cli/internal/pkg/cli.run({0x1f0cd70, 0xc000119ad0}) /codebuild/output/src864945534/src/internal/pkg/cli/cli.go:98 +0x59 github.com/aws/copilot-cli/internal/pkg/cli.buildEnvPkgCmd.func1(0x0?, {0x0?, 0x0?, 0x0?}) /codebuild/output/src864945534/src/internal/pkg/cli/env_package.go:282 +0x65 github.com/aws/copilot-cli/internal/pkg/cli.runCmdE.func1(0xc000426000?, {0x2caa6f0?, 0x0?, 0x0?}) /codebuild/output/src864945534/src/internal/pkg/cli/cli.go:72 +0x7b github.com/spf13/cobra.(*Command).execute(0xc000426000, {0x2caa6f0, 0x0, 0x0}) /go/pkg/mod/github.com/spf13/[email protected]/command.go:872 +0x694 github.com/spf13/cobra.(*Command).ExecuteC(0xc0003d1400) /go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4 github.com/spf13/cobra.(*Command).Execute(...) /go/pkg/mod/github.com/spf13/[email protected]/command.go:918 main.main() /codebuild/output/src864945534/src/cmd/copilot/main.go:34 +0x25
This is with running on macOS M1 and Linux x64, version 1.22.0.
I was able to get the package command to start working by downgrading to 1.21.1. I can see from cloudtrail that its trying to update the root policy for dev EnvManagerRole with a policy that does have two of the same Sids, "PatchPutObjectsToArtifactBucket".
Sorry, but I'm really hoping to work through this issue today so I can have a solution to move forward with the rest of my environments.
I fixed the EnvManagerRole issue, but now when I try to deploy the environment I get:
Export insuredportal-dev-SubDomain cannot be deleted as it is in use by insuredportal-dev-graphql, insuredportal-dev-ivr-server and insuredportal-dev-spa (and 1 more)
Also, I'm really worried this is in a state where I have to delete and re-create my dev environment which would be extremely disruptive. It also doesn't give me a lot of confidence in moving forward with the rest of my environments. Any help is appreciated.
I fixed the EnvManagerRole issue
Awesome!
Export insuredportal-dev-SubDomain cannot be deleted
Hmm, is there a domain that's associated with the application? was the application created with copilot app init --domain
? That error means that for some reason copilot couldn't find a domain name associated with the application but the stack seems to think there should be one.
To troubleshoot, in the application account if you go to AWS Systems Manager > Parameter Store
do you see your application have a domain name value or is it an empty string?
I fixed the EnvManagerRole issue
Awesome!
Export insuredportal-dev-SubDomain cannot be deleted
Hmm, is there a domain that's associated with the application? was the application created with
copilot app init --domain
? That error means that for some reason copilot couldn't find a domain name associated with the application but the stack seems to think there should be one.To troubleshoot, in the application account if you go to
AWS Systems Manager > Parameter Store
do you see your application have a domain name value or is it an empty string?
I do see the domain value. Could it be when I "Skipped" those resources to fix the failed rollback that might have broken something? I'm also noticing things missing like S3Bucket for copilot specific functions when I run a env package -n dev
. The stack seems really out of whack.
đ I'm trying to think about what could be the issue here.
I'm confused because if there is a domain value, then the insuredportal-dev-SubDomain
shouldn't be getting deleted.
When you run copilot env package
what are the lines that get deleted? Do you see an EnvironmentSubdomain
output written there.
S3Key and S3Bucket are empty for CertificateValidationFunction, CustomDomainFunction, and DNSDelegationFunction.
This is what I'm seeing for EnvironmentSubdomain:
EnvironmentSubdomain:
Condition: DelegateDNS
Value: !Sub ${EnvironmentName}.${AppName}.${AppDNSName}
Description: The domain name of this environment.
Export:
Name: !Sub ${AWS::StackName}-SubDomain
ah the empty values should be okay, if you use copilot env package --upload-assets
those values will be filled.
Would you mind running copilot env deploy
again and giving a screenshot of the Events tab in CloudFormation for the resources?
ah the empty values should be okay, if you use
copilot env package --upload-assets
those values will be filled.Would you mind running
copilot env deploy
again and giving a screenshot of the Events tab in CloudFormation for the resources?
I'm hitting the policy issue again. I have to update the stack directly and remove the PatchPutObjectsToArtifactBucket statement to get passed the error. Is there a better way?
Gotcha, in the CloudFormation template right now what do you see for the environment's version?
Description: CloudFormation environment template for infrastructure shared among Copilot workloads.
Metadata:
Version: v1.12.2 # What is this value for you?
I think for now if you update the IAM role directly in the console by giving it a different sid
for PatchPutObjectsToArtifactBucket
that should unblock the stack (like PatchPutObjectsToArtifactBucket-backup
)
Yes, I'm seeing v1.12.2 ... I updated the sid in the console but I'm still hitting the same error. The issue seems to be that its trying to create a policyDocument with two sids that are the same (PatchPutObjectsToArtifactBucket).
đ¤¯ and this error happens with copilot version
number v1.21.1
?
That permission only gets inserted if the template version is less than v1.9.0
:
https://github.com/aws/copilot-cli/blob/a830133d6615247bff5e3da4562e19936e2d29f8/internal/pkg/cli/deploy/patch/env.go#L133
So I don't get why it's trying to insert it twice, it should have detected that the version of the template already has the policy and doesn't need to update it again.
Would you mind sharing a screenshot of the top of your environment template?
Okay, so maybe this is the issue then?
Thats a screenshot of the currently deployed stack.
ah yeah!
Like you suspect, I think while skipping the rollback the permission got retained and the version remained v1.8.0
.
What I am confused by is if we were to delete the PatchPutObjectsToArtifactBucket
inlined policy from the IAM console, I feel like copilot env deploy
then should have worked?
Because the version is v1.8.0
and it will try to add the new sid, and then move forward. Are we certain that deleting the policy in the console and running the command results in the duplicate sid
issue?
ah yeah!
Like you suspect, I think while skipping the rollback the permission got retained and the version remained
v1.8.0
.What I am confused by is if we were to delete the
PatchPutObjectsToArtifactBucket
inlined policy from the IAM console, I feel likecopilot env deploy
then should have worked? Because the version isv1.8.0
and it will try to add the new sid, and then move forward. Are we certain that deleting the policy in the console and running the command results in the duplicatesid
issue?
I just deleted the PatchPutObjectsToArtifactBucket policy and tried again. Same error âšī¸.
đ I am so sorry about this churn. OK, then we'll trick copilot and upgrade the version of the template manually to v1.12.2
.
Would you mind editing the template directly in CloudFormation and overriding the value from v1.8.0
to v1.12.2
and updating the stack?
CloudFormation won't allow updates only to the Metadata
field, therefore to force update the stack, Copilot usually adds an output like this to the template:
Outputs:
LastForceDeployID:
Value: "44c1f39f-5505-4ca8-98c6-3755050626bb" # some random value that will force CloudFormation to update.
Description: Optionally force the template to update when no immediate resource change is present.
It's okay. I appreciate your helping (and I especially appreciate this app you all have built).
So, changing just the metadata won't work because cloudformation complains that there are no changes when I try to update the stack with a template (with only the metdata change). Should I also add a LastForceDeployID to get past this?
Should I also add a LastForceDeployID to get past this?
Yup, exactly. The LastForceDeployID like shown above should get passed it
Metadata is fixed.
Should I attempt to deploy?
Yup, go for it!
Yup, go for it!
We're back! Thank you so much. But, what happened? Maybe I did something wrong the first time? I'm done for the week now but I am planning on moving forward with the rest of my environments next week.
Thank you again!
Awesome!!!
Ok for the other environments, I think as long as you use copilot version v1.21.1
instead of v1.21.0
the upgrades should be smooth đ¤ !
But, what happened? Maybe I did something wrong the first time?
When using env deploy
with v1.21.0, we had missed a case while transforming the service discovery settings and had to put a patch release for v1.21.1.
I'm still not clear why the UPDATE_ROLLBACK_FAILED
as I don't have access to the events from CloudFormation. But during the deployment with v1.21, the EnvManagerRole
successfully upgraded with the new permissions, however when the rollback failed we ended up skipping the resource to continue the rollback.
I think we shouldn't have skipped it because the PatchPutObjectsToArtifactBucket
policy remained in the IAM role but the Metadata.Version
of the template rolled back to v1.8.
So the template was in this awkward state where the IAM role had the v1.12.2 policies but the other resources were on the v1.8 version.
Once we tricked copilot into stopping trying to update to the IAM role, then copilot env deploy
got unblocked and successfully deployed đ
I'll close the issue now, but if you run into any other problems during the upgrades feel free to open another issue!
Have a good weekend đ
I just encountered the same issue trying to run copilot env deploy
using copilot v1.23.0. I didn't need to rollback my stack manually, but instead the stack was getting stuck in the state "UPDATE_ROLLBACK_COMPLETE" with a red "X" next to it and copilot would not finish the deploy.
copilot env deploy
Only found one environment, defaulting to: development
â Unable to update the environment's manager role with upload artifacts permission
â upload artifacts for environment development: ensure env manager role has permissions to upload: update environment template with PutObject permissions: wait until stack ehbackend-development update is complete: ResourceNotReady: failed waiting for successful resource state
Following the instructions in https://github.com/aws/copilot-cli/issues/4069#issuecomment-1272114955 fixed it for me and I was able to deploy through copilot afterwards.