amplify-backend icon indicating copy to clipboard operation
amplify-backend copied to clipboard

CustomResource table create fails on first attempt when backups are enabled

Open cBiscuitSurprise opened this issue 7 months ago • 5 comments

Environment information

System:
  OS: Linux 5.15 Ubuntu 24.04.2 LTS 24.04.2 LTS (Noble Numbat)
  CPU: (12) arm64 unknown
  Memory: 28.50 GB / 31.17 GB
  Shell: /bin/bash
Binaries:
  Node: 22.15.0 - ~/.local/share/mise/installs/node/22.15.0/bin/node
  Yarn: undefined - undefined
  npm: 11.3.0 - ~/.local/share/mise/installs/node/22.15.0/bin/npm
  pnpm: undefined - undefined
NPM Packages:
  @aws-amplify/auth-construct: 1.6.0
  @aws-amplify/backend: 1.14.0
  @aws-amplify/backend-auth: 1.5.0
  @aws-amplify/backend-cli: 1.4.8
  @aws-amplify/backend-data: 1.4.0
  @aws-amplify/backend-deployer: 1.1.15
  @aws-amplify/backend-function: 1.12.1
  @aws-amplify/backend-output-schemas: 1.4.0
  @aws-amplify/backend-output-storage: 1.1.4
  @aws-amplify/backend-secret: 1.1.5
  @aws-amplify/backend-storage: 1.2.4
  @aws-amplify/cli-core: 1.2.3
  @aws-amplify/client-config: 1.5.5
  @aws-amplify/deployed-backend-client: 1.5.0
  @aws-amplify/form-generator: 1.0.3
  @aws-amplify/model-generator: 1.0.12
  @aws-amplify/platform-core: 1.6.0
  @aws-amplify/plugin-types: 1.8.0
  @aws-amplify/sandbox: 1.2.10
  @aws-amplify/schema-generator: 1.2.7
  aws-amplify: 6.12.2
  aws-cdk: 2.177.0
  aws-cdk-lib: 2.177.0
  typescript: 5.5.4
No AWS environment variables
No CDK environment variables

Describe the bug

We have enabled PITR on our backend tables (table.pointInTimeRecoveryEnabled = true;), however when new tables are added, the resource regularly fails to create with the following error:

Received response status [FAILED] from custom resource. Message returned: Backups are being enabled for the table: Todo-...-NONE. Please retry later (RequestId: ...)

This issue has come up during development, but a redeploy usually succeeds. We're currently trying to make our first deployment from dev to production where we're now creating several tables for the first time. We've re-tried the deployment several times and a different table seems to fail with this error each time. It seems that there's something in the CustomResource that isn't playing nice with PITR.

Any guidance here would be appreciated. At this point we're probably going to have to break the stack down and do one table at a time...

Reproduction steps

Deploy a new Model with PITR enabled.

cBiscuitSurprise avatar Apr 25 '25 17:04 cBiscuitSurprise

Hi @cBiscuitSurprise, Thank you for reporting this issue. It appears to be a timing issue between the creation of the DynamoDB table and the enablement of Point-in-Time Recovery (PITR). To help us better understand the root cause, could you please share your backend.ts file?

AnilMaktala avatar Apr 28 '25 01:04 AnilMaktala

Sorry got distracted with other stuff. This is blocking us again. Our backend infra is spread across many files (not just backend.ts). I can try to create a repo that replicates our issue. I think it's as simple as: create table, enable backups, deploy. Anytime I add new tables, this blocks the deployment. If I create the table first, deploy, then enable backups, deploy again, it works, but that's a pain.

Here's my util for enabling backups:

export function secureTables(backend: ChBackend) {
  const { amplifyDynamoDbTables } = backend.data.resources.cfnResources;
  for (const table of Object.values(amplifyDynamoDbTables)) {
    table.pointInTimeRecoveryEnabled = true;
  }
}

It also gets in the way anytime we add/remove relationships, since the table gets deleted and recreated (which is a separate nuisance ... it's really annoying to have our data blown away if we add/remove a relationship).

And then sometimes we get stuck in this: Received response status [FAILED] from custom resource. Message returned: Execution Already Exists: 'arn:aws:states:us-east-2:339713049787:execution:AmplifyTableWaiterStateMachine060600BC-PxonGMRHv92D:11564edb-9428-4258-9584-98240fb749ee' where we have to go delete the step-function and recreate it... This is really making us regret using Amplify. It's really going to suck the day that we irreversibly destroy our own production app just because of a simple table change.

cBiscuitSurprise avatar May 14 '25 19:05 cBiscuitSurprise

Sorry got distracted with other stuff. This is blocking us again. Our backend infra is spread across many files (not just backend.ts). I can try to create a repo that replicates our issue. I think it's as simple as: create table, enable backups, deploy. Anytime I add new tables, this blocks the deployment. If I create the table first, deploy, then enable backups, deploy again, it works, but that's a pain.

Here's my util for enabling backups:

export function secureTables(backend: ChBackend) { const { amplifyDynamoDbTables } = backend.data.resources.cfnResources; for (const table of Object.values(amplifyDynamoDbTables)) { table.pointInTimeRecoveryEnabled = true; } } It also gets in the way anytime we add/remove relationships, since the table gets deleted and recreated (which is a separate nuisance ... it's really annoying to have our data blown away if we add/remove a relationship).

And then sometimes we get stuck in this: Received response status [FAILED] from custom resource. Message returned: Execution Already Exists: 'arn:aws:states:us-east-2:339713049787:execution:AmplifyTableWaiterStateMachine060600BC-PxonGMRHv92D:11564edb-9428-4258-9584-98240fb749ee' where we have to go delete the step-function and recreate it... This is really making us regret using Amplify. It's really going to suck the day that we irreversibly destroy our own production app just because of a simple table change.

Stuck on the same issue. Adding PITR is the last line in backend.ts. Has not happened in Sandbox

prath-acn avatar May 22 '25 16:05 prath-acn

Facing the same issue.

NourDh avatar Jun 12 '25 09:06 NourDh

Current vs Expected Behavior

Current Behavior (PITR Enabled)

// User enables PITR in backend.ts
export function secureTables(backend: ChBackend) {
  const { amplifyDynamoDbTables } = backend.data.resources.cfnResources;
  for (const table of Object.values(amplifyDynamoDbTables)) {
    table.pointInTimeRecoveryEnabled = true;
  }
}

// First deployment attempt
❌ Received response status [FAILED] from custom resource. 
   Message returned: Backups are being enabled for the table: Todo-...-NONE. 
   Please retry later (RequestId: ...)

// Retry deployment
✅ Deployment succeeds

Expected Behavior

// User enables PITR in backend.ts
export function secureTables(backend: ChBackend) {
  const { amplifyDynamoDbTables } = backend.data.resources.cfnResources;
  for (const table of Object.values(amplifyDynamoDbTables)) {
    table.pointInTimeRecoveryEnabled = true;
  }
}

// First deployment attempt
✅ Deployment succeeds consistently

Hi @cBiscuitSurprise,

Thank you for reporting this issue. This is a known timing issue with Point-in-Time Recovery (PITR) enablement during fresh table deployments. We've identified this as a race condition in the custom resource that handles PITR configuration.

Root Cause: The custom resource attempts to enable PITR before the DynamoDB table reaches ACTIVE state, causing the "Backups are being enabled" error. This is why retry deployments typically succeed - the table is ready by then.

Immediate Workaround:

// Deploy in two phases to avoid the timing issue
// 1. Deploy tables without PITR first
// 2. Then enable PITR in a second deployment

export function secureTables(backend: ChBackend) {
  const { amplifyDynamoDbTables } = backend.data.resources.cfnResources;
  for (const table of Object.values(amplifyDynamoDbTables)) {
    table.pointInTimeRecoveryEnabled = true;
  }
}

This issue affects multiple users and is tracked as a bug. The fix requires:

  1. Adding proper table waiter logic in the custom resource
  2. Implementing retry logic with exponential backoff
  3. Fixing Step Function execution conflicts during retries

Related Issues: #1654

We encourage community contributions to help resolve this issue. The fix would involve modifying the custom resource handler to properly wait for table readiness before enabling PITR.

Priority: This should be treated as a P2 bug since it affects production deployments and requires manual workarounds, though retry deployments typically succeed.

Thank you for your patience, and we appreciate the detailed reproduction steps and user feedback that helps us understand the scope of this issue.

pahud avatar Aug 20 '25 17:08 pahud