amplify-category-api icon indicating copy to clipboard operation
amplify-category-api copied to clipboard

API can get stuck with error: Limit on the number of resources in a single stack operation exceeded

Open dpilch opened this issue 1 year ago • 0 comments

How did you install the Amplify CLI?

npm

If applicable, what version of Node.js are you using?

v18.19.1

Amplify CLI Version

12.10.3

What operating system are you using?

Mac

Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.

No

Describe the bug

The Amplify GraphQL API can reach a state where it is possibly not recoverable. A single CloudFormation deployment can only touch 2,500 resources. See Maximum number of CloudFormation resources a nested stack can create, update, or delete per operation. at https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html. This limit is a hard cap and cannot be modified. This is different than the 500 resource limit per stack.

Due to a design flaw in the GraphQL API construct, every resource in the API will be touched with a no-op operation. The no-op operations will contribute to the 2,500 limit. It is possible to reach a state where it is impossible to make any modification to a GraphQL API because adding or removing resources will cross the 2,500 limit. We have seen this error in the ConnectionStack but it is not confirmed if this is the only stack that will give this error.

It is not possible for a customer to identify how close they are to the limit. The information is contained within an internal log in CloudFormation. If a customer needs to identify the current account they will need to open a technical support ticket through AWS.

At this time we have received one report of a customer reaching this state and were not able to recover. It is not clear how the CloudFormation was originally deployed be above this limit.

Expected behavior

It will not be possible to remove this limit, but there a several options to improve this experience.

  1. Fix GraphQL API construct to not perform no-op on all resources on all deployments.
  2. Provide a warning when the API is approaching the limit.
  3. Fail before deploying the CloudFormation templates when exceeding/close to the limit if this can be identified locally.
  4. Provide an automated recovery tool.

Reproduction steps

We have not created a reliable reproduction at this time. The steps to repro is likely:

  1. Create a very large schema that gives the error.
  2. Modify the schema until a successful deployment at the 2,500 resource limit.
  3. Attempt to remove or add any resource from the API.
  4. amplify push and see the error.

Project Identifier

No response

Log output

# Put your logs below this line


Additional information

If no modification can be made through amplify push it is still possible to recover some APIs. This is done by manually modifying a child stack for a given model.

  1. Identify a model that does not have connections.
  2. Remove all queries and mutations from this model with @model(queries: null, mutations: null)
    1. This will remove the resolvers for this model and possibly lower the number of operations below the limit.
  3. amplify api gql-compile
  4. Open the CloudFormation console and locate the stack that corresponds with the model that was modified.
  5. Select Update
    1. Select Update nested stack and Update stack
    2. Select Replace existing template
    3. Select Upload a template file
    4. Upload the template file from your local project amplify/backend/api/<api-name>/build/stacks/<model-name>.json
    5. Use the defaults for the next two pages.
    6. Before selecting Submit ensure the Change set preview shows the resolvers being removed.
  6. After this deployment is successful attempt amplify push.
    1. If the manual CFN deployment is not successful the stack may be in a broken state. If the model stack reaches the state UPDATE_ROLLBACK_FAILED you will need to open technical support ticket to CloudFormation. Please state that the root stack is in a healthy state but a child stack is in the UPDATE_ROLLBACK_FAILED state.
  7. If the amplify push is successful begin to remove resources in waves until you can successfully push an update with stack mapping. https://docs.amplify.aws/javascript/build-a-backend/graphqlapi/modify-amplify-generated-resources/#place-appsync-resolvers-in-custom-named-stacks

Before submitting, please confirm:

  • [X] I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.
  • [X] I have removed any sensitive information from my code snippets and submission.

dpilch avatar Apr 08 '24 16:04 dpilch