terraform-aws-control_tower_account_factory icon indicating copy to clipboard operation
terraform-aws-control_tower_account_factory copied to clipboard

Error create new account

Open allanmori opened this issue 3 years ago • 8 comments

Terraform Version & Prov: Terraform v1.1.9

AFT Version:

1.4.0

Terraform Version & Provider Versions

terraform version

Terraform v1.1.9
on darwin_arm64
+ provider registry.terraform.io/hashicorp/archive v2.2.0
+ provider registry.terraform.io/hashicorp/aws v3.75.2
+ provider registry.terraform.io/hashicorp/local v2.2.3
+ provider registry.terraform.io/hashicorp/random v3.2.0
+ provider registry.terraform.io/hashicorp/time v0.7.2

terraform providers

Providers required by configuration:
.
├── provider[registry.terraform.io/hashicorp/aws]
└── module.aft
    ├── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    ├── provider[registry.terraform.io/hashicorp/local]
    ├── module.aft_iam_roles
    │   ├── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    │   ├── module.ct_management_service_role
    │   │   └── provider[registry.terraform.io/hashicorp/aws]
    │   ├── module.log_archive_exec_role
    │   │   └── provider[registry.terraform.io/hashicorp/aws] >= 2.7.0
    │   ├── module.log_archive_service_role
    │   │   └── provider[registry.terraform.io/hashicorp/aws]
    │   ├── module.aft_exec_role
    │   │   └── provider[registry.terraform.io/hashicorp/aws] >= 2.7.0
    │   ├── module.aft_service_role
    │   │   └── provider[registry.terraform.io/hashicorp/aws]
    │   ├── module.audit_exec_role
    │   │   └── provider[registry.terraform.io/hashicorp/aws] >= 2.7.0
    │   ├── module.audit_service_role
    │   │   └── provider[registry.terraform.io/hashicorp/aws]
    │   └── module.ct_management_exec_role
    │       └── provider[registry.terraform.io/hashicorp/aws] >= 2.7.0
    ├── module.aft_lambda_layer
    │   ├── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    │   ├── provider[registry.terraform.io/hashicorp/random]
    │   └── provider[registry.terraform.io/hashicorp/local]
    ├── module.aft_ssm_parameters
    │   └── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    ├── module.packaging
    │   └── provider[registry.terraform.io/hashicorp/archive]
    ├── module.aft_backend
    │   └── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    ├── module.aft_code_repositories
    │   ├── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    │   └── provider[registry.terraform.io/hashicorp/local]
    ├── module.aft_customizations
    │   ├── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    │   └── provider[registry.terraform.io/hashicorp/local]
    ├── module.aft_feature_options
    │   └── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    ├── module.aft_account_provisioning_framework
    │   └── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
    └── module.aft_account_request_framework
        ├── provider[registry.terraform.io/hashicorp/aws] >= 3.72.0, < 4.0.0
        └── provider[registry.terraform.io/hashicorp/time]

Providers required by state:

    provider[registry.terraform.io/hashicorp/random]

    provider[registry.terraform.io/hashicorp/local]

    provider[registry.terraform.io/hashicorp/archive]

    provider[registry.terraform.io/hashicorp/time]

    provider[registry.terraform.io/hashicorp/aws]

Bug Description Error creating a new account. Possible problem without lambda function and layer.

To Reproduce Steps to reproduce the behavior:

  1. Create file for creating a new account - OK
  2. Trigger to Pipeline (ct-aft-account-request) - OK
  3. Insert data to DynamoDB - OK
  4. DynamoDB Stream trigger to Lambda (aft-account-request-action-trigger) - OK
  5. Error lambda execution ([ERROR] InvalidParametersException: An error occurred (InvalidParametersException) when calling the GetProvisionedProductOutputs operation: Last Successful Provisioning Record doesn't exist. Expected behavior Just create a new account, I've already created some without any problem. But today it simply displays the error. Via Account Factory I can create normally

Related Logs { "time_stamp": "2022-08-02 19:14:27,878", "log_level": "INFO", "log_message": "Searching Account Factory for account with matching email in healthy status" } { "time_stamp": "2022-08-02 19:14:28,063", "log_level": "INFO", "log_message": "Getting SSM Parameter /aft/account/aft-management/sns/failure-topic-arn" } { "time_stamp": "2022-08-02 19:14:28,103", "log_level": "INFO", "log_message": "Sending SNS Message" } { "time_stamp": "2022-08-02 19:14:28,176", "log_level": "INFO", "log_message": { "MessageId": "ee325578-4444-5f56-8a1a-0e1b3d90887c", "ResponseMetadata": { "RequestId": "9cc03363-d5bc-5677-8b19-6800ac68115d", "HTTPStatusCode": 200, "HTTPHeaders": { "x-amzn-requestid": "9cc03363-d5bc-5677-8b19-6800ac68115d", "content-type": "text/xml", "content-length": "294", "date": "Tue, 02 Aug 2022 19:14:27 GMT" }, "RetryAttempts": 0 } } } { "time_stamp": "2022-08-02 19:14:28,179", "log_level": "ERROR", "log_message": { "FILE": "aft_account_request_action_trigger.py", "METHOD": "lambda_handler", "EXCEPTION": "An error occurred (InvalidParametersException) when calling the GetProvisionedProductOutputs operation: Last Successful Provisioning Record doesn't exist." } } Traceback (most recent call last): File "/var/task/aft_account_request_action_trigger.py", line 68, in lambda_handler logger.info(provisioned_product_exists(event_record)) File "/opt/python/lib/python3.8/site-packages/aft_common/account_request_framework.py", line 104, in provisioned_product_exists if email_exists_in_batch(account_email, pp_ids, ct_management_session): File "/opt/python/lib/python3.8/site-packages/aft_common/account_request_framework.py", line 121, in email_exists_in_batch pp_email = sc_client.get_provisioned_product_outputs( File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 415, in _api_call return self._make_api_call(operation_name, kwargs) File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 745, in _make_api_call raise error_class(parsed_response, operation_name) botocore.errorfactory.InvalidParametersException: An error occurred (InvalidParametersException) when calling the GetProvisionedProductOutputs operation: Last Successful Provisioning Record doesn't exist.

Additional context I've been analyzing lines:

aft-account-request-action-trigger

new_account = not provisioned_product_exists(event_record)
control_tower_updates = control_tower_param_changed(event_record)

And layer aft-common-1-4-0:

account_request_framework.py

def email_exists_in_batch(
    target_email: str, pps: List[str], ct_management_session: Session
) -> bool:
    sc_client = ct_management_session.client("servicecatalog")
    for pp in pps:
        pp_email = sc_client.get_provisioned_product_outputs(
            ProvisionedProductId=pp, OutputKeys=["AccountEmail"]
        )["Outputs"][0]["OutputValue"]
        if target_email.lower() == pp_email.lower():
            logger.info("Account email match found; provisioned product exists.")
            return True
    return False

allanmori avatar Aug 02 '22 19:08 allanmori

@allanmori thank you for reaching out.

Is this consistently failing with retries too? I would recommend updating to latest AFT v1.6.2 as there have been multiple changes and bug fixes since AFT 1.4.0, and then retrying the account request.

snebhu3 avatar Aug 02 '22 22:08 snebhu3

@snebhu3 Sorry, I couldn't find the procedure to do the update in the repository. Just change the version in ParameterStore or do you need to run something?

allanmori avatar Aug 03 '22 14:08 allanmori

@allanmori The steps to update would depend on how AFT was deployed in the your Control Tower environment:

  • If the AFT repository was cloned and then deployed:

    • Update cloned git repository using git pull
    • Deploy the changes by executing terraform apply against the CT management account
  • If the AFT repository is sourced from Github / Terraform Registry directly as a dependency:

    • Update AFT repository using terraform init -upgrade
    • Deploy the changes by executing terraform apply against the CT management account

snebhu3 avatar Aug 03 '22 16:08 snebhu3

Hi @snebhu3..

I upgraded to version 1.6.2. But I'm still enjoying the same error:

[ERROR] InvalidParametersException: An error occurred (InvalidParametersException) when calling the GetProvisionedProductOutputs operation: Last Successful Provisioning Record doesn't exist.
Traceback (most recent call last):
  File "/var/task/aft_account_request_action_trigger.py", line 65, in lambda_handler
    new_account = not provisioned_product_exists(event_record)
  File "/opt/python/lib/python3.8/site-packages/aft_common/account_request_framework.py", line 112, in provisioned_product_exists
    if email_exists_in_batch(account_email, pp_ids, ct_management_session):
  File "/opt/python/lib/python3.8/site-packages/aft_common/account_request_framework.py", line 129, in email_exists_in_batch
    pp_email = sc_client.get_provisioned_product_outputs(
  File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 915, in _make_api_call
    raise error_class(parsed_response, operation_name)
    ```

allanmori avatar Aug 04 '22 18:08 allanmori

In the file account_request_framework.py. The function:

def get_healthy_ct_product_batch( ct_management_session: Session, ) -> Iterator[List[ProvisionedProductAttributeTypeDef]]: sc_product_search_filter: Mapping[Literal["SearchQuery"], Sequence[str]] = { "SearchQuery": [ "type:CLOUD_FORMATION_TEMPLATE", ] }

The type field was like "CONTROL_TOWER_ACCOUNT".

I made the change in the layer, I uploaded a new version and it worked.

Thanks

allanmori avatar Aug 04 '22 19:08 allanmori

@allanmori the filter for "type:CONTROL_TOWER_ACCOUNT" is intended as AFT needs to look for Service provisioned catalog products which are Control Tower accounts.

I suspect that there was a Service Catalog provisioned product in your environment which was not is a successful state causing this failure. You could check that by logging into your Control Tower management account and looking under Service Catalog > Provisioned products.

snebhu3 avatar Aug 04 '22 22:08 snebhu3

Hi, we are running into the same issue. The import of a single control tower account into AFT failes because any other product is in a failed enrollment state.

In this particular case I would expect the method in aft-commons get_healthy_ct_product_batch to return only healthy products. Instead it throws: An error occurred (InvalidParametersException) when calling the GetProvisionedProductOutputs operation: Last Successful Provisioning Record doesn't exist.

I assume that there should be a try catch around get_provisioned_product_outputs that returns false whenever "Last Successful Provisioning Record doesn't exist." is returned. Maybe it is also possible to prevent this call to happen in general - since as mentioned in our case. This error is thrown when we try to import a complete different account / different product id.

def email_exists_in_batch(
    target_email: str, pps: List[str], ct_management_session: Session
) -> bool:
    sc_client = ct_management_session.client("servicecatalog")
    for pp in pps:
        pp_email = sc_client.get_provisioned_product_outputs(
            ProvisionedProductId=pp, OutputKeys=["AccountEmail"]
        )["Outputs"][0]["OutputValue"]
        if target_email.lower() == pp_email.lower():
            logger.info("Account email match found; provisioned product exists.")
            return True
    return False

saefty avatar Aug 05 '22 12:08 saefty

@saefty, thank you for the feedback and the details. I have created a backlog to address this use case.

snebhu3 avatar Aug 08 '22 17:08 snebhu3

We're getting the same issue when trying to create our first account using AFT 1.7.0. Our provisioned products listing says that the status is available and I can't find anything that looks like it has failed either. I'm getting the same error that everyone else has listed in the /aws/lambda/aft-account-request-action-trigger log group. Is the suggested fix to modify the get_healthy_ct_product_batch function?

samvalidere avatar Jan 04 '23 21:01 samvalidere

Hey @samvalidere,

We're aware of this issue & targeting a fix for the next release. This error can occur in some cases if any of the AWS Control Tower Account Factory provisioned products are in a TAINTED state, not just the provisioned product for the account that is being customized.

Could you confirm if you have any other TAINTED account provisioned products? If so, getting those provisioned products into an AVAILABLE state and retrying your request is a good next step.

stumins avatar Jan 06 '23 22:01 stumins

I'm not sure that I fully understand as I am not seeing the phrase TAINTED anywhere, but here is everything that I checked in the AWS account where we manage control tower.

In service catalog, there is only provisioned product is AFT and it is listed as available.

In control tower, all of our OUs have the state Registered, all of the AWS accounts with the exception of one have the state Enrolled and the one that does not has the state Enrollment failed. We are also on landing zone version 2.2 if that matters at all.

@stumins So right now, my assumption is that this one account that has the Enrollment failed state is causing the issue, even though we have not created any accounts through AFT yet.

samvalidere avatar Jan 09 '23 14:01 samvalidere

Ok, in service catalog the provisioned products were filtered by user and not for the entire account. Once I changed that I found an Enroll Account product that was tainted. Following the directions in the AWS documentation, I had to terminate that product in order to be able to fix the problem. However, I don't recommend doing that because it removed an AWS account from the OU and now we can't get it re-enrolled.

samvalidere avatar Jan 09 '23 17:01 samvalidere

@samvalidere we went through the same stages with the Control Tower / Service Cataloge UI which is very developer unfriendly. Control Tower / Service Cataloge uses CloudFromation in the background to enroll accounts. I assume that your account is tained because there are CloudFromation stacks in a failed state. Most likely some resources are not tracked by CloudFromation anymore.

CloudFormation tries to create some resources which fail due to name conflicts as they where previously created but are not tracked in the state anymore.

We invested a lot of time to manually fix those (20) tained accounts manaully through the CloudFormation UI. In some cases it was the easiest to delete existing resources by using the root account to assume the AWSControlTowerExecution role which exisats in all child accounts managed by the ControlTower.

In example: Some accounts got the sns topic aws-controltower-SecurityNotifications created in all regions. We deletes those so that the CloudFromation stack passed as shown in the screenshot. It failed before because for some reason CloudFormation tried to create the resource again even though it already existed.

grafik

saefty avatar Jan 10 '23 17:01 saefty

@allanmori @samvalidere @saefty

AFT 1.8.0 fixes this issue - the AFT account provisioning workflow no longer fails when any Account Factory provisioned product is in a Tainted state without a prior successful event.

I'm going to leave this issue open for a bit to gather feedback - please update AFT to 1.8.0 and let us know via this issue if this does not resolve your problem.

stumins avatar Jan 18 '23 21:01 stumins

We haven't received any additional reports, so I'm going to close the issue as resolved. Please open a new issue if you experience this problem on AFT 1.8.0+

stumins avatar Jan 30 '23 18:01 stumins