aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

Docker asset building fails with Finch - image tagging issues

Open eduborto opened this issue 6 months ago • 3 comments

Describe the bug

CDK fails to properly build and tag Docker images when using Finch as the Docker replacement via CDK_DOCKER=finch. The build process reports success but doesn't create images with the expected tags, causing deployment failures with "image not found" errors during ECR push.

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Last Known Working CDK Library Version

No response

Expected Behavior

When using CDK_DOCKER=finch, CDK should successfully:

  1. Build Docker images using Finch
  2. Tag images with the correct CDK asset hash
  3. Push images to ECR
  4. Deploy the stack successfully

This should work seamlessly as a Docker Desktop replacement.

Current Behavior

CDK reports successful Docker image building but fails during deployment:

  1. CDK reports: MyStack: success: Built MyDockerImage
  2. But then fails with: cdkasset-{hash}: not found
  3. ECR push fails: image not found
  4. Stack deployment fails

Error logs:

MyStack: start: Building MyDockerImage
time="2025-06-14T01:08:05-03:00" level=fatal msg="cdkasset-2b1d94499a1aa9de5707f04c383f484af3111aa7f1bcbf6289093d429bfb74bd: not found"
MyStack: success: Built MyDockerImage
MyStack: start: Publishing MyDockerImage (current_account-current_region)
time="2025-06-14T01:08:06-03:00" level=fatal msg="image \"XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/cdk-hnb659fds-container-assets-XXXXXXXXXXXX-us-west-2:2b1d94499a1aa9de5707f04c383f484af3111aa7f1bcbf6289093d429bfb74bd\": not found"
MyStack: success: Published MyDockerImage (current_account-current_region)

Reproduction Steps

// 1. Install Finch: brew install finch
// 2. Create CDK project with Docker assets

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';

export class CdkFinchReproStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Minimal setup to reproduce Finch Docker asset issue
    const vpc = new ec2.Vpc(this, 'TestVpc', {
      maxAzs: 2,
      natGateways: 0, // Keep costs minimal
    });

    const cluster = new ecs.Cluster(this, 'TestCluster', { vpc });

    const taskDefinition = new ecs.FargateTaskDefinition(this, 'TestTask', {
      memoryLimitMiB: 512,
      cpu: 256,
    });

    // This Docker asset will fail with Finch
    taskDefinition.addContainer('TestContainer', {
      image: ecs.ContainerImage.fromAsset('./docker'),
      memoryLimitMiB: 512,
      logging: ecs.LogDrivers.awsLogs({
        streamPrefix: 'test-container',
        logRetention: cdk.aws_logs.RetentionDays.ONE_DAY,
      }),
    });

    // Optional: Create the service (comment out to just test image building)
    // new ecs.FargateService(this, 'TestService', {
    //   cluster,
    //   taskDefinition,
    //   desiredCount: 1,
    // });
  }
}
// 3. Create Dockerfile in ./docker/
// FROM public.ecr.aws/docker/library/python:3.13-slim
// WORKDIR /app
// RUN echo 'print("Hello from CDK-Finch reproduction test!")' > app.py
// CMD ["python", "app.py"]

// 4. Deploy: CDK_DOCKER=finch cdk deploy --require-approval never

Possible Solution

The issue appears to be in CDK's Docker integration layer when using Finch. Suggested fixes:

  1. Improve error handling: CDK should validate that images are actually created before reporting success
  2. Better Finch integration: Ensure CDK properly handles Finch's output and image tagging
  3. Clearer error messages: When Docker builds fail, provide actionable error messages

Working workaround: Pre-build images with correct tags before CDK deployment - see full script in Additional Information.

Additional Information/Context

  • This issue specifically affects users who use Finch instead of Docker Desktop
  • Finch is Amazon's own open-source container development tool
  • The workaround confirms that Finch can successfully build the images when called correctly
  • Issue seems to be in CDK's Docker integration layer, not in Finch itself
  • Affects ECS deployments, Lambda container functions, and any CDK construct using Docker assets
  • High impact: Prevents CDK deployment for users using Docker alternatives

AWS CDK Library version (aws-cdk-lib)

2.201.0

AWS CDK CLI version

2.1018.1

Node.js Version

22.16.0

OS

macOS ARM64

Language

TypeScript

Language Version

TypeScript (5.8.3)

Other information

Environment Details:

  • Finch Version: v1.8.3
  • Docker Alternative: Finch (Docker not installed)
  • Platform: linux/arm64

Related Context:

  • Finch GitHub: https://github.com/runfinch/finch
  • This affects users who cannot or choose not to install Docker Desktop
  • The workaround proves the issue is in CDK's integration, not Finch itself

Working Workaround Script:

// scripts/build-docker.js
const fs = require('fs');
const path = require('path');
const { execSync } = require('child_process');

function buildDockerAssets() {
  const cdkOutPath = path.join(process.cwd(), 'cdk.out');
  const entries = fs.readdirSync(cdkOutPath, { withFileTypes: true });

  entries
    .filter(entry => entry.isDirectory() && entry.name.startsWith('asset.'))
    .forEach(entry => {
      const assetHash = entry.name.replace('asset.', '');
      const assetPath = path.join(cdkOutPath, entry.name);

      if (fs.existsSync(path.join(assetPath, 'Dockerfile'))) {
        const imageName = `cdkasset-${assetHash}`;
        process.chdir(assetPath);
        execSync(`finch build --platform linux/arm64 -t ${imageName} .`, { stdio: 'inherit' });
      }
    });
}

buildDockerAssets();

npm scripts:

{
  "scripts": {
    "build-docker": "npm run cdk synth --silent && node scripts/build-docker.js",
    "deploy": "npm run build-docker && CDK_DOCKER=finch cdk deploy --require-approval never"
  }
}

eduborto avatar Jun 14 '25 05:06 eduborto

Hi @eduborto,

Thank you for reporting this issue with CDK Docker asset building when using Finch. We attempted to reproduce the problem you described but were unable to replicate the failure in our test environment.

Our Reproduction Setup

Environment Details: • Finch Version: v1.8.3 (latest) • CDK Library Version: 2.201.0 (matching your reported version) • CDK CLI Version: 2.1018.1 (latest) • Node.js Version: v20.17.0 • Platform: macOS ARM64 • Docker: Not running (Finch only)

Test Configuration: We created a minimal CDK stack with: • ECS Fargate task definition • Docker asset built from a simple Python container • Deployment using CDK_DOCKER=finch

Results

✅ First deployment: Successfully built Docker image using Finch and deployed to AWS ✅ Second deployment: Successfully reused existing ECR image and deployed

No "cdkasset-{hash}: not found" errors occurred in our testing.

Critical Question

Does this issue happen consistently every time you deploy, or is it intermittent?

This is important because: • If it happens every time: There's likely a systematic configuration difference between our environments • If it's intermittent: It could be a race condition, timing issue, or resource contention problem

Additional Information Needed

To help us reproduce and fix this issue, could you also provide:

  1. Consistency details: • Does the failure occur on every cdk deploy with Finch? • Or does it sometimes work and sometimes fail? • If intermittent, what percentage of deployments fail?

  2. Environment specifics: • Exact Finch version (finch --version) • CDK CLI version (cdk --version) • Operating system details • Available system resources (memory, disk space)

  3. Minimal reproduction case: • Simple CDK project that demonstrates the issue • Complete error logs from failed deployments

Understanding the consistency pattern will help us focus our investigation on the right type of root cause.

pahud avatar Jun 14 '25 17:06 pahud

Hi @pahud !

This happens every time, in every cdk deploy with Finch.

I've changed my environment to match yours:

% node -v
v20.17.0
% finch --version
finch version v1.8.3
% npx cdk --version
2.1018.1 (build cb71364)

About the OS:

  • System Version: macOS 15.5 (24F74)
  • Kernel Version: Darwin 24.5.0
  • Memory: 16 GB
% uname -a
Darwin c889f3e05071 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:49 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6000 arm64
=== Memory ===
PhysMem: 14G used (2637M wired, 2643M compressor), 1669M unused.
=== Disk ===
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   460Gi    10Gi   186Gi     6%    426k  1,9G    0%   /

The code is at cdk-finch-repro.zip

And the log at output.redacted.log

This run I did using a fresh new AWS account, and no container was uploaded to ECR.

% aws ecr describe-images --repository-name cdk-hnb659fds-container-assets-<account_id>-us-west-2
{
    "imageDetails": []
}

eduborto avatar Jun 14 '25 20:06 eduborto

Hi @eduborto,

Thank you for the detailed bug report and excellent reproduction case. We've conducted a thorough investigation using your provided code and setup.

Our Reproduction Results

Environment Setup (Matching Yours): • ✅ Finch Version: v1.8.3 • ✅ CDK CLI Version: 2.1018.1 • ✅ CDK Library: 2.201.0 • ✅ Node.js: v22.11.0 (updated from v20.17.0) • ✅ Platform: macOS ARM64 • ✅ CDK_DOCKER=finch environment variable

Reproduction Steps We Followed

  1. Environment Setup:
cd /Users/hunhsieh/repos/issue-triage/cdk-finch-repro
finch --version  # Confirmed v1.8.3
npm install      # Installed dependencies
  1. CDK Asset Generation:
npx cdk synth    # Successfully generated CloudFormation template and Docker assets
  1. Finch Functionality Verification:
cd cdk.out/asset.e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e
finch build -t test-manual-build .  # ✅ Worked perfectly
finch images | grep test-manual-build  # ✅ Image created successfully
  1. Workaround Script Testing:
node scripts/build-docker.js  # ✅ Successfully built with CDK naming
finch images | grep cdkasset  # ✅ Confirmed cdkasset-{hash} image exists
  1. Full CDK Deploy Attempt:
CDK_DOCKER=finch npx cdk deploy --require-approval never --verbose

Key Findings

  1. Your reproduction case is excellent - we successfully used your code and confirmed all components work individually
  2. Finch functions perfectly - manual Docker builds work flawlessly
  3. Your workaround script is correct - it successfully builds images with proper CDK naming
  4. However, we could NOT reproduce the specific failure you experienced

What We Observed

Our deployment succeeded where yours failed:

# Our successful result:
[15:47:01] CdkFinchReproStack: build: Building Docker image at /Users/hunhsieh/repos/issue-triage/cdk-finch-repro/cdk.out/asset.e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e
[15:47:01] CdkFinchReproStack: debug: shell_open
#7 naming to docker.io/library/cdkasset-e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e:latest done
#7 unpacking to docker.io/library/cdkasset-e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e:latest 0.0s done
#7 DONE 0.0s
CdkFinchReproStack: success: Built TestTask/TestContainer/AssetImage
CdkFinchReproStack: success: Published TestTask/TestContainer/AssetImage

Your failure pattern:

[16:59:59] CdkFinchReproStack: cached: Cached cdkasset-e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e
time="2025-06-14T17:00:00-03:00" level=fatal msg="cdkasset-e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e: not found"
CdkFinchReproStack: success: Built TestTask/TestContainer/AssetImage  # ← False success

Root Cause Analysis

The key difference is the "cached: Cached cdkasset-..." message in your logs. This suggests the bug is environment-specific and likely related to:

  1. CDK Asset Caching Issues: CDK thinks images are cached when they don't exist
  2. State Corruption: Mismatch between CDK's cache and actual Finch images
  3. Race Conditions: Timing issues in CDK's Finch integration
  4. Previous Deployment State: Corrupted state from previous failed deployments

Questions to Help Isolate the Issue

To help the AWS team fix this, could you provide:

  1. Consistency: You have confirmed this happens on every cdk deploy.
  2. Clean State Test: What happens if you:

continue

   rm -rf cdk.out
   rm -rf node_modules/.cache
   finch system prune -a
   CDK_DOCKER=finch cdk deploy --require-approval never
  1. Cache Investigation: Do you see any cached assets before the failure?
 finch images | grep cdkasset

Detailed Reproduction Steps We Used

For the AWS team's reference, here are the exact steps we followed:

Step 1: Initial Setup

cd /Users/hunhsieh/repos/issue-triage/cdk-finch-repro
finch --version  # Output: finch version v1.8.3
npm install      # Successfully installed 300 packages

Step 2: CDK Synthesis

npx cdk synth    # Generated assets in cdk.out/
ls cdk.out/      # Confirmed asset.e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e/ exists

Step 3: Manual Finch Verification

cd cdk.out/asset.e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e
finch build -t test-manual-build .
# Result: ✅ Build successful with proper output
finch images | grep test-manual-build
# Result: ✅ Image listed correctly

Step 4: Workaround Script Test

cd /Users/hunhsieh/repos/issue-triage/cdk-finch-repro
node scripts/build-docker.js
# Result: ✅ Successfully built cdkasset-{hash} image
finch images | grep cdkasset
# Result: ✅ CDK-named image exists

Step 5: Full Deploy Test

CDK_DOCKER=finch npx cdk deploy --require-approval never --verbose
# Result: ✅ Complete success - no fatal errors, proper ECR push

Step 6: Error Pattern Testing

finch rmi cdkasset-e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e
finch inspect cdkasset-e7d74bf49e19e94db4ff5f038bde37e6a17e0aeaea70e82061d87c04be7d545e
# Result: ✅ Reproduced exact Finch error messages you reported

What This Tells Us

  1. Your environment triggers the bug, ours doesn't - This confirms it's environment-specific
  2. Finch error messages are identical - When images don't exist, Finch reports the same fatal errors you saw
  3. CDK integration works in clean environments - But fails under certain conditions
  4. The caching mechanism is the likely culprit - Your logs show "cached" messages we didn't see

Recommendation for AWS Team

This bug needs attention because: • User's reproduction case is solid though we can't reproduce it • The failure mode is silent and dangerous • CDK's error handling with Finch is clearly broken in certain conditions • The workaround proves the integration should work • Our reproduction proves the bug is conditional, not universal

Suggested fixes:

  1. Fix CDK's asset caching logic when using Finch
  2. Improve error handling to catch Finch fatal errors
  3. Add validation to ensure images exist before reporting success
  4. Better cache invalidation when Docker builds fail
  5. Add debug logging to help identify when caching issues occur

For Other Users Experiencing This Issue

If you're hitting the same problem:

  1. Use the workaround script provided by @eduborto - it works perfectly
  2. Clear your CDK cache before deploying:
 rm -rf cdk.out
 finch system prune -a
  1. Monitor for the "cached:" message - if you see it, the bug might trigger

Thank you for the excellent bug report and reproduction case. This helps significantly in understanding the issue even though we couldn't reproduce the exact failure mode. Your detailed logs and code made our investigation possible.

As we still can't reproduce this issue using the provided code, I am making it a p2 bug and we welcome more reports from the community. Please help us prioritize with 👍.

pahud avatar Jun 16 '25 20:06 pahud

Hi @pahud!

Starting with a clean state:

% rm -rf cdk.out
% rm -rf node_modules/.cache
% finch system prune -a # This deleted a lot of stuff

It worked!!!

I'll restart the usual development workflow without using the alternate script, and I'll update this issue if anything comes up. I'd like to thank your support and attention.

eduborto avatar Jun 23 '25 12:06 eduborto

Thank you very much for the update! I've learned a lesson from Finch as well.

I am setting this issue auto closed if no updates.

pahud avatar Jun 23 '25 20:06 pahud

@eduborto I came across this issue while tracking another report where finch builds were failing with CDK. The issue was tracked down to a patch in the internal release and has now been fixed.

swagatbora90 avatar Jun 24 '25 21:06 swagatbora90

@pahud Going forward will it be possible to tag issues related to Finch with a 'Finch' label? I think that will help us address any finch related questions or triage finch reported failures.

swagatbora90 avatar Jun 24 '25 21:06 swagatbora90

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.

github-actions[bot] avatar Jun 27 '25 09:06 github-actions[bot]