pulumi-aws icon indicating copy to clipboard operation
pulumi-aws copied to clipboard

Failed to create and update `controltower.LandingZone`

Open aureq opened this issue 11 months ago • 1 comments

What happened?

When trying to create an AWS ControlTower LandingZone resource, Pulumi appears to be confused during the resource lifecycle.

On the 1st apply, if the properties manifestJson.centralizedLogging.configurations.accessLogginBucket.retentionDays and manifestJson.centralizedLogging.configurations.logginBucket.retentionDays are set as string, the resource creation will fail and the AWS api returns an error ❌. However, if those properties are set at int, then the resource is correctly created ✅.

On the 2nd and following updates, if those properties remain as int (no changes, from the previous successful step) in the code, Pulumi indicates an update on the resource is required and the values should be updated to string (see screenshot below) ❌.

If the values are switched to string in the code, then Pulumi doesn't show any changes ❌ because it seems it has recorded the values as string in its state (see state below).

If the values are changed to string in the code and other modifications are made to the resource, then Pulumi sees changes ✅ but the resource fails to update like it does during the 1st apply ❌ with an API error from AWS.

It appears there's some implicit conversions happening while processing the manifestJson and how it is stored in the stack state.

Example

source code

import json

import pulumi

import pulumi_aws as aws

root_ou_id = 'something'

log_archive_account = aws.organizations.Account(
    'log-archive',
    name = 'Audit',
    email = '[email protected]',
    parent_id = root_ou_id
)

audit_account = aws.organizations.Account(
    'audit',
    name = 'Audit',
    email = '[email protected]',
    parent_id = root_ou_id
)

kms_key = aws.kms.Key(
    'landing-zone-key',
    description = 'A key for a landing zone',
    enable_key_rotation = True,
    deletion_window_in_days = 30
)

key_policy = pulumi.Output.all(
    region=aws.get_region(),
    key_id=kms_key.id,
    account=aws.get_caller_identity().account_id,
).apply(lambda args: json.dumps(
    {
        "Version": "2012-10-17",
        "Id": "control-tower-landing-zone",
        "Statement": [
            {
                "Sid": "Enable IAM User Permissions",
                "Effect": "Allow",
                "Principal": {
                    "AWS": f"arn:aws:iam::{args['account']}:root"
                },
                "Action": "kms:*",
                "Resource": "*"
            },
            {
                "Sid": "Allow Config to use KMS for encryption",
                "Effect": "Allow",
                "Principal": {
                    "Service": "config.amazonaws.com"
                },
                "Action": [
                    "kms:Decrypt",
                    "kms:GenerateDataKey"
                ],
                "Resource": f"arn:aws:kms:{args['region']}:{args['account']}:key/{args['key_id']}"
            },
            {
                "Sid": "Allow CloudTrail to use KMS for encryption",
                "Effect": "Allow",
                "Principal": {
                    "Service": "cloudtrail.amazonaws.com"
                },
                "Action": [
                    "kms:GenerateDataKey*",
                    "kms:Decrypt"
                ],
                "Resource": f"arn:aws:kms:{args['region']}:{args['account']}:key/{args['key_id']}",
                "Condition": {
                    "StringEquals": {
                        "aws:SourceArn": f"arn:aws:cloudtrail:{args['region']}:{args['account']}:trail/aws-controltower-BaselineCloudTrail"
                    },
                    "StringLike": {
                        "kms:EncryptionContext:aws:cloudtrail:arn": f"arn:aws:cloudtrail:*:{args['account']}:trail/*"
                    }
                }
            }
        ]
    }
))

aws.kms.KeyPolicy(
    'landing-zone-key-policy',
    key_id = kms_key.id,
    policy = key_policy,
    opts = pulumi.ResourceOptions(
        parent = kms_key
    )
)

manifest_json = pulumi.Output.all(
    kms_key= kms_key.arn,
    log_archive_account=log_archive_account.id,
    audit_account=audit_account.id,
    region=aws.get_region()
).apply(lambda args: json.dumps(
    {
        "governedRegions": [f"{args['region']}"],
        "organizationStructure": {
            "security": {
            "name": "Security"
            }
        },
        "centralizedLogging": {
            "accountId": f"{args['log_archive_account']}",
            "configurations": {
            "accessLoggingBucket": {
                "retentionDays": 3650
            },
            "kmsKeyArn": f"{args['kms_key']}",
            "loggingBucket": {
                "retentionDays": 365
            }
            },
            "enabled": True
        },
        "securityRoles": {
            "accountId": f"{args['audit_account']}"
        },
        "accessManagement": {
            "enabled": True
        }
    }
))

aws.controltower.LandingZone(
    'landing_zone',
    manifest_json = manifest_json,
    version = '3.3',
    opts = pulumi.ResourceOptions(
        depends_on = [key_policy, log_archive_account, audit_account]
    )
)

stack state

state.zip

pulumi diff

image

Output of pulumi about

pulumi/aws 6.24.0

Additional context

Note, creating such resource is difficult in nature as the requirements are

  • root access to the AWS account
  • multiple AWS accounts are required
  • properly setup AWS Organizations

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

aureq avatar Mar 18 '24 01:03 aureq

Possibly related to upstream issue https://github.com/hashicorp/terraform-provider-aws/issues/35763

aureq avatar Mar 18 '24 01:03 aureq

Unfortunately I could not run a full repro since our sandbox account does not support aws.organizations.Account but I could run a partial repro based on the provided statefile and program, trying out pulumi diff.

The provider does not diff or modify retentionDays. If the program and state match (either strings, or numbers), there is no diff.

This is because both providers model manifest as strings.

There is some normalization going on in the provider but it simply asserts that JSON whitespace is irrelevant:

https://github.com/hashicorp/terraform-provider-aws/blob/master/internal/service/controltower/landing_zone.go#L78

What then happens is that this data is passed down to the AWS SDK for Go which uses the Smithy IDL and client generator to talk to the service. Successfully created resources are normalized via read and pulling the manifest from the cloud:

https://github.com/hashicorp/terraform-provider-aws/blob/master/internal/service/controltower/landing_zone.go#L132

It appears that either the Smithy IDL code or the service itself normalizes these values.

I could not locate the Smithy definition for Control Tower.

The recommended way to specify the retentionDays property is using numbers:

https://docs.aws.amazon.com/controltower/latest/userguide/lz-api-launch.html

I believe that if you specify retention days as numbers, Pulumi will work correctly and no further action is required.

We could attempt some Pulumi-level normalization or validation here but it would be preferable if this was done upstream at either the Terraform provider or even better at the AWS SDK for Go layer since they have access and are managing the original Smithy manifest for the data here.

t0yv0 avatar Apr 09 '24 19:04 t0yv0

Checking with @aureq this is actually worse for the customer as they're sending numbers in and it still converts to string. We definitely should fix this. Will look into getting creds for an e2e repro.

t0yv0 avatar Apr 10 '24 18:04 t0yv0