aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

[pipelines] Add stage to strip assets out of cloud assembly before deploying to CloudFormation

Open MamishIo opened this issue 5 years ago • 36 comments

To avoid the CodePipeline artifact size limit in CloudFormation deploy actions, the pipeline should generate an intermediate artifact which is the cloud assembly but with asset files removed, and use this as the input for the deploy actions.

Use Case

Regardless of the source provider used, CFN deploy actions have an input artifact size limit of 256MB. The CDK pipeline uses the initial cloud assembly, containing all asset files, all the way through to the CFN action inputs, even though the stacks don't require them (as far as I understand the asset system, all assets are published and linked to CFN parameters by this point).

For builds that produce large/multiple assets totalling over 256MB, this causes CodePipeline limit errors in the deployment stages. Assemblies up to 1GB or 5GB (depending on the source provider) could be produced with this change.

Specific example: monorepos used to build many related services that are all deployed as separate containers/functions/etc.

Proposed Solution

Add an extra pipeline stage after asset publishing and before application stage deployment, which runs a CodeBuild action to load the cloud assembly, strip out asset files, and generate a new artifact containing only the CFN templates and any data necessary for CFN. The CFN actions should use this new artifact as their input.

Other

  • This is currently exaggerated by the lack of a de-dupe option when deploying multiple application stages using the same assets - I believe feature request #9627 will reduce code size substantially.

  • Overall code size can be reduced by using Lambda layers, but this adds build and deploy complexity compared to using standalone code assets.

  • [] :wave: I may be able to implement this feature request

  • [] :warning: This feature might incur a breaking change (:warning: assumes I haven't overlooked some need for the CFN actions to have direct access to asset files)


This is a :rocket: Feature Request

MamishIo avatar Aug 23 '20 09:08 MamishIo

I'm also seeing this error and becoming very close to hitting the limit 253MB with 36 lambdas, 1 docker container, and two application stages, Staging and Prod.

seawatts avatar Sep 15 '20 16:09 seawatts

@MamishIo any progress here?

seawatts avatar Sep 27 '20 10:09 seawatts

Ran into this too while deploying to multiple regions, worked for 2 regions, got the limit on 3.

jonathan-kosgei avatar Oct 02 '20 18:10 jonathan-kosgei

@MamishIo is there any workaround for this?

jonathan-kosgei avatar Oct 03 '20 14:10 jonathan-kosgei

There is no easy workaround as of yet.

rix0rrr avatar Oct 05 '20 15:10 rix0rrr

What could work is producing 2 cloud artifacts from the synth step (one with the assets, one without) and then using property overrides to switch between them for the different actions.

rix0rrr avatar Oct 05 '20 16:10 rix0rrr

@rix0rrr Is there any timeline when this might be fixed? We're not able to use pipelines for a multi-region setup because of this.

jonathan-kosgei avatar Oct 06 '20 16:10 jonathan-kosgei

There is no timeline as of yet.

rix0rrr avatar Oct 19 '20 07:10 rix0rrr

Another workaround you could try is postprocessing the .json files in a post-build script in your Synth step and dedupe the assets yourself.

rix0rrr avatar Oct 19 '20 07:10 rix0rrr

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Oct 26 '20 12:10 github-actions[bot]

I'm not sure if this issue is completely fixed by #11008, as it is reported. #11008 only fixes issues of assets being needlessly duplicated—it doesn't do anything to solve the issues with assets needlessly moving forward in the pipeline and potentially hitting size limits. I'm currently encountering this issue, as my assets include several large Docker images with large build context dependencies. As a result, the CloudAssembly artifact hits 4.6GB in size by the time it goes forward into the CFN deployment stage.

@rix0rrr

maxdumas avatar Nov 24 '20 22:11 maxdumas

I'm not sure if this issue is completely fixed by #11008, as it is reported. #11008 only fixes issues of assets being needlessly duplicated—it doesn't do anything to solve the issues with assets needlessly moving forward in the pipeline and potentially hitting size limits. I'm currently encountering this issue, as my assets include several large Docker images with large build context dependencies. As a result, the CloudAssembly artifact hits 4.6GB in size by the time it goes forward into the CFN deployment stage.

Also faced this issue recently. Even after some optimization, I'm uncomfortably close to the 256MB limit.

vibe avatar Nov 30 '20 06:11 vibe

@rix0rrr Any chance we could get this re-opened? See my comment above.

maxdumas avatar Jan 25 '21 15:01 maxdumas

This is continuously causing us pain when working through deploying a bunch of static assets via https://docs.aws.amazon.com/cdk/api/latest/docs/aws-s3-deployment-readme.html.

shortjared avatar Feb 03 '21 20:02 shortjared

This is currently causing us pain. The Master build won't go through on our pipeline. We have the following error staring at us:

Action execution failed Artifact [Artifact_Build_Synth] exceeds max artifact size

Please help.

drdivine avatar Apr 20 '21 09:04 drdivine

Artifact [CloudAssemblyArtifact] exceeds max artifact size

This is a real pain and it breaks our pipelines. Any chance the 256 MB limit can be increased?

mpuhacz avatar May 10 '21 19:05 mpuhacz

Any update on this?

SimonJang avatar Sep 30 '21 11:09 SimonJang

Does anyone have any working work-arounds for this - and @AWS team, is there anything specific that could be worked on to aid in this?

ChrisSargent avatar Nov 08 '21 15:11 ChrisSargent

It would be great to at least have a viable workaround for this until a fix is put into place. It's causing a lot pain.

hoos avatar Nov 09 '21 16:11 hoos

Does anyone have any working work-arounds for this - and @aws team, is there anything specific that could be worked on to aid in this?

You can try something like this, its quite hacky but it works for me. Add a new Pre-Shell/CodeBuild Stage step. Get the current (latest) cloud-assembly artifact from S3. Remove the assets, in my case all JARs. Copy it back to S3. That's it. The buildspec should look something like this:

  "version": "0.2",
  "phases": {
    "build": {
      "commands": [
        "LATEST=$(aws s3 ls s3://<path-to-cloudassembly>/ | sort | tail -n 1 | awk '{print $4}')",
        "aws s3 cp s3://<path-to-cloudassembly>/$LATEST .",
        "unzip $LATEST -d tmp",
        "cd tmp",
        "rm -rf *.jar",
        "zip -r -A $LATEST *",
        "aws s3 cp $LATEST s3://<path-to-cloudassembly>/"
      ]
    }
  }
}

Don't forget to add a S3::PutObject Permission to the ServiceRole.

rmanig avatar Dec 17 '21 09:12 rmanig

We hit this issue today too.

@aws team it would sure be nice if someone took the time to add a clearer guide on how to work around this given that it doesn't sound like a fix is on the radar soon.

thank you

ewahl-al avatar Feb 16 '22 23:02 ewahl-al

This might not be applicable for most people since my project is a bit weird (e.g. Java instead of TS, legacy CDK pipeline lib, CodeBuild synth via buildspec.yml file...) but I finally put together a workaround for this by generating a second no-assets artifact and post-processing the pipeline template to use the new artifact for CFN actions (side note: I'd have preferred doing this purely in CDK but it seemed impractical in this case).

https://github.com/HtyCorp/serverbot2-core/commit/d4397291b98098ae2d337ef86dd4ba8f580ff09a

The pipeline is spitting out 260MB assemblies now but deploying without any problems! Hope that helps someone even if it's not a great general solution.

MamishIo avatar Feb 19 '22 07:02 MamishIo

Unless I'm mistaken, all assets are already published past the Assets step, meaning it is safe to strip all assets from the synth output in an inital Wave. I believe this is a generic solution that is basically plug-and-play for aws-cdk 2.12. Could be that the rm -rfv <files> needs customization for your needs.

        strip_assets_step = CodeBuildStep(
            'StripAssetsFromAssembly',
            input=pipeline.cloud_assembly_file_set,
            commands=[
                'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
                'ZIP_ARCHIVE=$(basename $S3_PATH)',
                'rm -rfv asset.*',
                'zip -r -q -A $ZIP_ARCHIVE *',
                'aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH',
            ],
        )
        pipeline.add_wave('BeforeDeploy', pre=[strip_assets_step])
        # Add your stages...
        
        pipeline.build_pipeline()
        pipeline.pipeline.artifact_bucket.grant_write(strip_assets_step.project)

tobni avatar Mar 10 '22 09:03 tobni

@tobni the strip_assets_step is working correctly for me and shows the artifact SynthOutput is 1.1 MB however the subsequent stages in the wave still get an input artifact SynthOutput that's 200MB+. Is there a missing step to get them to use the output from strip_assets_step?

Edit: It seems to work but only in the region the pipeline is. This is because the other regions seem to get the assembly from a different bucket with name format <stack-name>-seplication<some region specific id> I don't see a way to be able to get the names of the region specific s3 artifact buckets to copy the new zip to.

jonathan-kosgei avatar Apr 02 '22 18:04 jonathan-kosgei

I finally got @tobni's code to work with cross region replication, which uses a different randomly named bucket for every region!

strip_assets_step = CodeBuildStep(
    'StripAssetsFromAssembly',
    input=pipeline.cloud_assembly_file_set,
    commands=[
        "cross_region_replication_buckets=$(grep BucketName cross-region-stack-* | awk -F ':' '{print $4}' | tr '\n' ' ' | tr -d '\"')",
        'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
        'ZIP_ARCHIVE=$(basename $S3_PATH)',
        'rm -rf asset.*',
        'zip -r -q -A $ZIP_ARCHIVE *',
        'aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH',
        'object_location=${S3_PATH#*/}',
        'for bucket in $cross_region_replication_buckets; do aws s3 cp $ZIP_ARCHIVE s3://$bucket/$object_location; done'
    ],
)

And you need the following permissions

pipeline.build_pipeline()
pipeline.pipeline.artifact_bucket.grant_write(strip_assets_step.project)
strip_assets_step.project.add_to_role_policy(
    iam.PolicyStatement(
        effect=iam.Effect.ALLOW,
        resources=[f"arn:aws:s3:::<pipeline stack name>-seplication/*", f"arn:aws:s3:::<pipeline stack name>-seplication*"],
        actions=["s3:*"],
    )
)
strip_assets_step.project.add_to_role_policy(
    iam.PolicyStatement(
        effect=iam.Effect.ALLOW,
        resources=["*"],
        actions=["kms:GenerateDataKey"]
    )
)

jonathan-kosgei avatar Apr 12 '22 18:04 jonathan-kosgei

@tobni and @jonathan-kosgei thanks a lot guys for the help. Just leaving my TS version here for folks to C and V.

    let strip = new CodeBuildStep("StripAssetsFromAssembly", {
      input: pipeline.cloudAssemblyFileSet,
      commands: [
        'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
        "ZIP_ARCHIVE=$(basename $S3_PATH)",
        "echo $S3_PATH",
        "echo $ZIP_ARCHIVE",
        "ls",
        "rm -rfv asset.*",
        "zip -r -q -A $ZIP_ARCHIVE *",
        "ls",
        "aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH",
      ],
      rolePolicyStatements:[ new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        resources: ["*"],
        actions: ["s3:*"],
      }),
      new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        resources: ["*"],
        actions: ["kms:GenerateDataKey"],
      })]
   
    });

    pipeline.addWave("BeforeStageDeploy", {
      pre: [strip],
    });

BenassiJosef avatar Apr 17 '22 15:04 BenassiJosef

Tagging @rix0rrr and @MamishIo following advice from the Comment Visibility Warning.

I ran into this issue today.

I believe the current situation is that people have found a bit of an icky workaround in adding extra CodeBuildSteps to clean out the assets in the SynthOutput (See above comments) but it would be great to not have to do this.

Based on what others have said it seems like the SynthOutput doesn't need to be passed at all in the first place and this could be removed? Doing so would render this workaround unneeded.

rurounijones avatar Sep 02 '22 17:09 rurounijones

We hit this issue this week and had to put together a work around from the answers here.

Adding to the comments from @jonathan-kosgei to add a version of the awk command that works if you have one or more cross-region stacks. @jonathan-kosgei version works for more than one cross-region stacks, but awk-ing on : will fail with $4 when only one cross-region stack is present; the element of interest is at $2; using BucketName solves that and works regardless of the number of cross-region stacks.

strip_assets_step = CodeBuildStep(
    'StripAssetsFromAssembly',
    input=pipeline.cloud_assembly_file_set,
    commands=[
        "cross_region_replication_buckets=$(grep BucketName cross-region-stack-* | awk -F 'BucketName' '{print $2}' | tr -d ': ' | tr -d '\"' | tr -d ',')",
        'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
        'ZIP_ARCHIVE=$(basename $S3_PATH)',
        'rm -rf asset.*',
        'zip -r -q -A $ZIP_ARCHIVE *',
        'aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH',
        'object_location=${S3_PATH#*/}',
        'for bucket in $cross_region_replication_buckets; do aws s3 cp $ZIP_ARCHIVE s3://$bucket/$object_location; done'
    ],
)

You can also access the replication names dynamically from pipeline.pipeline.cross_region_support:

pipeline.build_pipeline()
cross_region_support = pipeline.pipeline.cross_region_support
replication_bucket_arns = [
    cross_region_support[key].replication_bucket.bucket_arn
    for key in cross_region_support.keys()]
replication_bucket_objects = [arn + '/*' for arn in replication_bucket_arns]
replication_resources = replication_bucket_arns + replication_bucket_objects
pipeline.pipeline.artifact_bucket.grant_write(strip_assets_step.project)
strip_assets_step.project.add_to_role_policy(
    cdk.aws_iam.PolicyStatement(
        effect=cdk.aws_iam.Effect.ALLOW,
        resources=replication_resources,
        actions=["s3:*"],
    )
)
strip_assets_step.project.add_to_role_policy(
    cdk.aws_iam.PolicyStatement(
        effect=cdk.aws_iam.Effect.ALLOW,
        resources=["*"],
        actions=["kms:GenerateDataKey"]
    )
)

hrvg avatar Dec 20 '22 08:12 hrvg

@rix0rrr it seems the common workaround is to wipe out the assets. Is this a suggested workaround?

wr-cdargis avatar Feb 17 '23 20:02 wr-cdargis

Throwing in my TypeScript solution for cross-region buckets based on the above:

    const { crossRegionSupport, artifactBucket } = pipeline.pipeline
    const artifactBuckets = [
      artifactBucket,
      ...Object.values(crossRegionSupport).map((crs) => crs.replicationBucket),
    ]
    for (const bucket of artifactBuckets) {
      bucket.grantReadWrite(stripAssetsStep.project)
    }

moltar avatar Apr 10 '23 11:04 moltar