aws-cdk
aws-cdk copied to clipboard
(aws-s3): bucket policy fails to create when bucket:arn is not yet available
Describe the bug
A dependency issue between S3 Buckets and Bucket Policies in the L2 Bucket class allows the Policy to access the arn of the bucket before it is available, causing the creation of the Bucket Policy to fail. Being a dependency issue, this is an intermittent issue and works correctly the vast majority of the time. When it fails, simply relaunching the stack usually works.
Expected Behavior
The L2 Bucket construct should launch successfully every time.
Current Behavior
testPolicy9D625504
CREATE_FAILED
Unable to retrieve Arn attribute for AWS::S3::Bucket, with error message Bucket not found
Reproduction Steps
I created a simple CDK app with this code:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as s3 from 'aws-cdk-lib/aws-s3';
export class BucketPolicyDependencyStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
new s3.Bucket(this, 'test', {
removalPolicy: cdk.RemovalPolicy.DESTROY,
autoDeleteObjects: true
})
}
}
I then set up a bash script that launched it 40 times, essentially simultaneously:
export constructs="
// Put any 30 values here, I just used 30 integers
"
for iteration in $constructs; do
export STACK_NAME=stresstest$iteration
cdk deploy -o stress$iteration --require-approval never &
done
On 1 of the 30 I saw the error I reference above.
Possible Solution
If I am interpreting the behavior correctly, it seems that adding a Dependency on the Bucket to the BucketPolicy in the L2 Construct would prevent the Policy from trying to access the bucket before it is ready. Perhaps here? https://github.com/aws/aws-cdk/blob/3318a38a6092275d461ef3549f3b92cd0d040c18/packages/aws-cdk-lib/aws-s3/lib/bucket.ts#L651
Additional Information/Context
We've seen it in several of our constructs (and newer versions of the CDK than what I cite below for the test above). Someone also mentioned they have seen it in aws-codepipline.
CDK CLI Version
2.108.0
Framework Version
2.108.0
Node.js Version
20.9.0
OS
MacOS Ventura 13.6.3
Language
TypeScript
Language Version
Typescript 5.2.2
Other information
Versions cited are for the test I cited, but it's been seen in other versions as well.
Unfortunately I can't reproduce this for a few attemps
export class Demo extends DemoStack {
constructor(scope: Construct, id: string, props: StackProps) {
super(scope, id, props);
new s3.Bucket(this, 'test', {
removalPolicy: RemovalPolicy.DESTROY,
autoDeleteObjects: true
})
}
}
app.ts
for (let i=0; i<30; i++) {
new Demo(app, `demo${i}stack`, { env });
}
And I deploy with
npx cdk deploy --all --require-approval never --concurrency 30
I didn't see any error after a few attempts.
Can you try it again?
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.
Working on replicating again. I'm at a loss for a method to recreate it deterministically - it appears to be triggered by the S3 Create Bucket being a bit slow. I'm going to try to set up a test that just keeps repeating the stress test indefinitely, hoping to catch the slower S3 behavior when it occurs.
I am facing the exact issue as well. It seems that cloudformation tries to create the bucket policy before the bucket creation is complete. Its inconsistent but saw it a few times in the last 2-3 weeks.
I was able to recreate it. I set set up an infinite loop that launched and destroyed 40 stacks in an infinite loop. Started the loop at around 11:30 AM and finally saw the issue recur at 11:34 AM. As I said, it is intermittent...
I am also facing a similar issue. Seems to be happen intermittently and started becoming an issue just before Christmas. Note the buckets (and stacks they are in) haven't been changed for a few months, so seems like a fairly new problem.
Talking to some coworkers, our theory is that the issue is not CDK per se - that a change in CloudFormation led to CloudFormation ceasing to recognize the dependency of the policy on the bucket from the context of the template (I'm running my tests using the generated template rather than the CDK program to confirm this).
If this is the case, then the issue is not necessarily within the CDK - but an update to the S3 Bucket construct to explicitly set the dependency would smooth over the CFN issue.
i am having the same issue with just creating a bucket with an access policy as well.
const logBucket = new Bucket(
this,
${config.kitName}-alb-logs-bucket,
{
blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
removalPolicy:RemovalPolicy.DESTROY,
autoDeleteObjects: true
}
)
Unable to retrieve Arn attribute for AWS::S3::Bucket, with error message Bucket not found
This is confirmed to be a CloudFormation issue. The word from AWS is:
Due to a recent change in internal workflow of CloudFormation, our development teams have identified an issue that can cause this error intermittently. They are currently working on deploying a fix for the same.
So it seems that there's no change to CDK needed, that for the moment we just retry after a failure and it clear up entirely - hopefully soon.
I am seeing this issue myself quite frequently. As with everyone else who have commented, this is a new behavior that was not occurring before.
I am using the CDK BucketDeployment, which automatically generates a parallel construct containing a lambda function, IAM role and policy. It is the policy that is trying to reference the arn of the bucket with Fn::GetAtt in the synthesized output. This seems to be failing about 50% if the time. I can cope with this by retrying the stack creation and cloudformation will simply start where it left off and complete the rest of the way.
biffgaut, can you reference where you found the AWS issue being reported? This is something I would want to monitor (and possibly bug them about - it's a pain).
Thanks.
That message was from an internal ticket here at AWS - there isn't any further info available at the moment. I have not seen this issue referenced online anywhere but here, which is shocking to me as it has occurred on several workloads managed by our team so I would assume the impact is bigger than the few people monitoring this issue.
As an FYI this has happened ~60 times in the last 60 days so @biffgaut you're not alone here.
We are also running into this issue with lambda function roles, I suspect it's not* isolated to bucket policies.
I opened a support ticket with the AWS cloudformation team. They repeated to me the same thing they did to biffgaut. They did say this was a high priority issue, so I'd like to think the resolution is imminent. Support tickets are not allowed to be left open for more than 10 days for known bugs, but the AWS support rep did tell me that I could contact my organizations AWS account rep to ping me when the bug is fixed, or possibly the ticket might remain open until the fix is in because I asked for it to be. In any event, it looks like I will get notified somehow. When I do, I'll update this issue.
I am also facing the same problem. It is really annoying as it is hampering deployments. Has anyone figured out a workaround?
I am also experiencing the same issue.
Work Around the Issue for now: Option 1:
- Create the S3 Bucket manually
- Import the S3 bucket to CDK code using
cdk import
command - You can use
cdk diff
to see any differences between your manually created S3 Bucket and your CDK stack. Change your CDK stack to match that of the Config. Source: cdk import
Option 2:
- create the bucket without any bucket policy options (such as
removal_policy
,enforce_ssl
) using CDK. So create baseline bucket with a bucket name only. - Deploy the baseline bucket
- Add bucket policy options to the CDK code and re-deploy the stack. (you can add any S3 Bucket parameters that don't need to delete and recreate the bucket)
- This should not remove the bucket but append policy to it. Hence should be able to find the bucket ARN when it is being deployed.
Happening again yall...
Hi so if you're running into this issue running a static site out of an s3 bucket via cloudfront you can split the code into 2 stacks for a more reliable CI/CD process.
Bucket Stack:
/**
* Content bucket
*/
new s3.Bucket(this, 'SiteBucket', {
bucketName: `${buildDomain(props.domainSegments)}`,
websiteIndexDocument: 'index.html',
websiteErrorDocument: 'index.html',
// publicReadAccess: true,
// autoDeleteObjects: true,
// accessControl: BucketAccessControl.PUBLIC_READ,
/**
* The default removal policy is RETAIN, which means that cdk destroy will not attempt to delete
* the new bucket, and it will remain in your account until manually deleted. By setting the policy to
* DESTROY, cdk destroy will attempt to delete the bucket, but will error if the bucket is not empty.
*/
// removalPolicy: cdk.RemovalPolicy.DESTROY, // NOT recommended for production code
});
Distro Stack (with domain stuff):
/**
* Hosted zone
*/
const zone = route53.HostedZone.fromLookup(this, 'Zone', {
domainName: props.domainSegments.domain,
});
new cdk.CfnOutput(this, 'URL', {
value: `https://${util.buildDomain(props.domainSegments)}`,
});
/**
* TLS certificate
*/
const certificate = new acm.Certificate(this, 'Certificate', {
domainName: `${util.buildDomain(props.domainSegments)}`,
validation: acm.CertificateValidation.fromDns(zone),
});
new cdk.CfnOutput(this, 'CertificateOutput', {
value: certificate.certificateArn,
});
const oai = new cloudfront.OriginAccessIdentity(this, 'OAI');
const bucket = s3.Bucket.fromBucketName(
this,
'StaticSiteBucket',
`${util.buildDomain(props.domainSegments)}`
);
bucket.grantPublicAccess();
const bucketPolicy = new s3.BucketPolicy(this, 'BucketPolicy', {
bucket,
});
// Grant public access through the bucket policy
bucketPolicy.document.addStatements(
new iam.PolicyStatement({
actions: ['s3:GetObject'],
resources: [bucket.arnForObjects('*')],
principals: [
new iam.CanonicalUserPrincipal(
oai.cloudFrontOriginAccessIdentityS3CanonicalUserId
),
],
})
);
new cdk.CfnOutput(this, 'SiteBucketOutput', { value: bucket.bucketName });
/**
* Cloudfront OAI
*/
/**
* CloudFront distribution that provides HTTPS
*/
this.distribution = new cloudfront.Distribution(this, 'myDist', {
defaultRootObject: 'index.html',
minimumProtocolVersion: cloudfront.SecurityPolicyProtocol.TLS_V1_2_2021,
defaultBehavior: {
origin: new cloudfront_origins.S3Origin(bucket, {
originAccessIdentity: oai,
}),
compress: true,
allowedMethods: cloudfront.AllowedMethods.ALLOW_GET_HEAD_OPTIONS,
viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
},
errorResponses: [
{
httpStatus: 403,
responseHttpStatus: 403,
responsePagePath: '/index.html',
ttl: cdk.Duration.minutes(30),
},
],
domainNames: [`${util.buildDomain(props.domainSegments)}`],
certificate: certificate,
});
new cdk.CfnOutput(this, 'DistributionIdOutput', {
value: this.distribution.distributionId,
});
/**
* Route53 alias record for the CloudFront distribution
*/
new route53.ARecord(this, 'SiteAliasRecordOutput', {
recordName: `${util.buildDomain(props.domainSegments)}`,
target: route53.RecordTarget.fromAlias(
new route53_targets.CloudFrontTarget(this.distribution)
),
zone,
});
/**
* Build sources depending on if there are more things that need to be added
* Take the strings in extraSources and map them to extra sources
*/
const sources = props.extraSources
? [
...props.extraSources.map((path) => s3_deployment.Source.asset(path)),
s3_deployment.Source.asset(props.pathToAssets),
]
: [s3_deployment.Source.asset(props.pathToAssets)];
/**
* Automated s3 deployment
*/
new s3_deployment.BucketDeployment(this, 'DeployWithInvalidation', {
sources: [...sources],
destinationBucket: bucket,
distribution: this.distribution,
distributionPaths: ['/*'],
});
Also, pay me.
Is there any update to this? I am attempting to deploy a bucket and a stackset and the stackset fails because the bucket policy does not finish deploying, despite the policy not being built until after the bucket.