terraform-provider-datadog icon indicating copy to clipboard operation
terraform-provider-datadog copied to clipboard

datadog_integration_aws.datadog_integration: error creating a Amazon Web Services integration: API error 502 Bad Gateway

Open debu99 opened this issue 5 years ago • 12 comments

Datadog provider 1.8

Error: Error applying plan:

1 error(s) occurred:

  • datadog_integration_aws.datadog_integration: 1 error(s) occurred:

  • datadog_integration_aws.datadog_integration: error creating a Amazon Web Services integration: API error 502 Bad Gateway:

debu99 avatar May 03 '19 16:05 debu99

Hey @debu99 Thanks for reporting this issue and apologies for the delay. While I work on attempting to reproduce the issue, do you have an example terraform configuration that triggers this error to be raised? Thanks for any additional information!

nmuesch avatar May 28 '19 22:05 nmuesch

I went ahead and attempted to create the AWS Integration based on the provider documentation here - https://www.terraform.io/docs/providers/datadog/r/integration_aws.html#example-usage and was able to successfully create the integration.

In addition to my previous note, are you still facing this issue?

nmuesch avatar May 28 '19 22:05 nmuesch

Hey @debu99 as I wasn't able to reproduce this issue, I'll go ahead and close this for now. Please do let me know if you continue to hit an issue with this!

nmuesch avatar Aug 13 '19 14:08 nmuesch

I have a reproduction for this problem on v2.1.0.

Here's the request (TF_LOG=1):

POST //api/v1/integration/aws?api_key=xx&application_key=xxx

Host: api.datadoghq.eu
User-Agent: Go-http-client/1.1
Content-Length: 175
Content-Type: application/json
Accept-Encoding: gzip

{
 "account_id": "XX",
 "role_name": "DatadogAWSIntegrationRole",
 "filter_tags": [],
 "host_tags": [
  "org:bla",
  "env:prod"
 ],
 "account_specific_namespace_rules": {
  "opsworks": false
 }
}

response:

2019/09/12 10:58:22 [DEBUG] Datadog API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 502 Bad Gateway
Content-Length: 107
Alt-Svc: clear
Cache-Control: no-cache
Content-Type: text/html
Date: Thu, 12 Sep 2019 13:58:22 GMT
Via: 1.1 google

<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>

-----------------------------------------------------

Some context:

  • this used to work, we're currently rewiring the mapping of our AWS accounts to Datadog. The original integration was done ~3 months ago, and code is unchanged.
  • the AWS integration gets correctly created on DD side, so it's somewhere on the exit path or a secondary action when processing the request
  • note that we run against api.datadoghq.eu, not .com

Terraform file (unchanged copy, this is 1-1 what used to work and now fails):

#######################################################################
# Datadog integration

# https://docs.datadoghq.com/integrations/amazon_web_services/?tab=allpermissions
# https://docs.datadoghq.com/integrations/faq/aws-integration-with-terraform/

provider "aws" {
  version = "2.20.0"
}

data "aws_ssm_parameter" "datadog_api_key" {
  name = "/datadog/dd_api_key"
}

data "aws_ssm_parameter" "datadog_app_key" {
  name = "/datadog/dd_app_key"
}

provider "datadog" {
  version = "2.1.0"

  api_key = "${data.aws_ssm_parameter.datadog_api_key.value}"
  app_key = "${data.aws_ssm_parameter.datadog_app_key.value}"
  api_url = "https://api.datadoghq.eu/"
}


locals {
  role_name = "DatadogAWSIntegrationRole"
}

resource "datadog_integration_aws" "integration" {
    account_id = "${data.aws_caller_identity.current.account_id}"
    role_name = "${local.role_name}"
    //filter_tags = ["key:value"]
    host_tags = [
      "org:${var.org}",
      "env:${var.env}"
    ]
    account_specific_namespace_rules = {
        //auto_scaling = false
        opsworks = false
    }
}

data "aws_iam_policy_document" "datadog_aws_integration_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type = "AWS"
      identifiers = [
        # grant Datadog's account access ...
        "arn:aws:iam::464622532012:root"
      ]
    }

    # ... if the external ID matches
    condition {
      test = "StringEquals"
      variable = "sts:ExternalId"

      values = [
        "${datadog_integration_aws.integration.external_id}"
      ]
    }
  }
}

# https://docs.datadoghq.com/integrations/amazon_web_services/?tab=allpermissions#datadog-aws-iam-policy
data "aws_iam_policy_document" "datadog_aws_integration" {
  statement {
    actions = [
      "apigateway:GET",
        "autoscaling:Describe*",
        "budgets:ViewBudget",
        "cloudfront:GetDistributionConfig",
        "cloudfront:ListDistributions",
        "cloudtrail:DescribeTrails",
        "cloudtrail:GetTrailStatus",
        "cloudwatch:Describe*",
        "cloudwatch:Get*",
        "cloudwatch:List*",
        "codedeploy:List*",
        "codedeploy:BatchGet*",
        "directconnect:Describe*",
        "dynamodb:List*",
        "dynamodb:Describe*",
        "ec2:Describe*",
        "ecs:Describe*",
        "ecs:List*",
        "elasticache:Describe*",
        "elasticache:List*",
        "elasticfilesystem:DescribeFileSystems",
        "elasticfilesystem:DescribeTags",
        "elasticloadbalancing:Describe*",
        "elasticmapreduce:List*",
        "elasticmapreduce:Describe*",
        "es:ListTags",
        "es:ListDomainNames",
        "es:DescribeElasticsearchDomains",
        "health:DescribeEvents",
        "health:DescribeEventDetails",
        "health:DescribeAffectedEntities",
        "kinesis:List*",
        "kinesis:Describe*",
        "lambda:AddPermission",
        "lambda:GetPolicy",
        "lambda:List*",
        "lambda:RemovePermission",
        "logs:Get*",
        "logs:Describe*",
        "logs:FilterLogEvents",
        "logs:TestMetricFilter",
        "logs:PutSubscriptionFilter",
        "logs:DeleteSubscriptionFilter",
        "logs:DescribeSubscriptionFilters",
        "rds:Describe*",
        "rds:List*",
        "redshift:DescribeClusters",
        "redshift:DescribeLoggingStatus",
        "route53:List*",
        "s3:GetBucketLogging",
        "s3:GetBucketLocation",
        "s3:GetBucketNotification",
        "s3:GetBucketTagging",
        "s3:ListAllMyBuckets",
        "s3:PutBucketNotification",
        "ses:Get*",
        "sns:List*",
        "sns:Publish",
        "sqs:ListQueues",
        "support:*",
        "tag:GetResources",
        "tag:GetTagKeys",
        "tag:GetTagValues",
        "xray:BatchGetTraces",
        "xray:GetTraceSummaries",
        // https://docs.datadoghq.com/integrations/amazon_event_bridge/
        // https://eu-central-1.console.aws.amazon.com/events/home?region=eu-central-1#/partners/datadoghq.com
        "events:CreateEventBus"
    ]

    resources = ["*"]
  }
}

resource "aws_iam_policy" "datadog_aws_integration" {
  name = "DatadogAWSIntegrationPolicy"
  policy = "${data.aws_iam_policy_document.datadog_aws_integration.json}"
}

resource "aws_iam_role" "datadog_aws_integration" {
  name = "${local.role_name}"
  description = "Role for Datadog AWS Integration"
  assume_role_policy = "${data.aws_iam_policy_document.datadog_aws_integration_assume_role.json}"
}

resource "aws_iam_role_policy_attachment" "datadog_aws_integration" {
  role = "${aws_iam_role.datadog_aws_integration.name}"
  policy_arn = "${aws_iam_policy.datadog_aws_integration.arn}"
}

unthought avatar Sep 12 '19 14:09 unthought

Hey, thanks for the reproduction steps. I'll open this issue back up for now.

nmuesch avatar Sep 12 '19 15:09 nmuesch

Hey @unthought, I think this issue was now fixed in the backend code - I'm no longer able to reproduce it. Could you please give it another try and let me know the result? Thanks!

bkabrda avatar Feb 04 '20 14:02 bkabrda

We're experiencing exactly the same problem but when creating GCP integrations.

The "failed" requests are actually creating integrations in DD, but because the response is 502, terraform is not, rightfully so, adding them to the state file.

Subsequent terraform applies return 409 error code from Datadog indicating the the resources already exist, which it does.

To fix the state, we need to manually delete the "failed" integrations in DD's UI and re-apply the terraform config. In the case listed below, we needed to go through that process multiple times to get all those integrations created correctly.

It almost feels like there is some sort of throttling on the Ddatadog API side where only 1-2 create calls would return a success. The rest would return as 502s.

$ terraform apply -lock-timeout=300s plan.tfout
--
49 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-1"]: Creating...
50 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-2"]: Creating...
51 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-3"]: Creating...
52 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-4"]: Creating...
53 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-5"]: Creating...
54 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-6"]: Creating...
55 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-7"]: Creating...
56 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-8"]: Creating...
57 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-1"]: Creation complete after 1s [id=xxx-1]
58 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-3"]: Creation complete after 1s [id=xxx-3]
59 | module.gcp.datadog_integration_gcp.gcp-projects["xxx-6"]: Creation complete after 1s [id=xxx-6]
60 |  
61 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"******
62 |  
63 |  
64 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
65 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
66 |  
67 |  
68 |  
69 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"******
70 |  
71 |  
72 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
73 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
74 |  
75 |  
76 |  
77 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"******
78 |  
79 |  
80 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
81 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
82 |  
83 |  
84 |  
85 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"******
86 |  
87 |  
88 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
89 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
90 |  
91 |  
92 |  
93 | Error: error creating a Google Cloud Platform integration: API error 502 Bad Gateway: ******"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"******
94 |  
95 |  
96 | on integrations/gcp/projects.tf line 89, in resource "datadog_integration_gcp" "gcp-projects":
97 | 89: resource "datadog_integration_gcp" "gcp-projects" ******
98 |  
99 |  
100 | time="2020-03-06T14:58:17Z" level=fatal msg="Failed to execute a command" error="exit status 1"

msuterski avatar Mar 06 '20 16:03 msuterski

This happens also when describing a monitor. We have several monitors in the same TF configuration and it happens quite frequently. My suggestion would be to add retry mechanism to the provider, at least for GET operations.

Error: error checking monitor exists: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"}

jurajseffer avatar Jul 28 '20 18:07 jurajseffer

@jurajseffer if this is still an issue, can you open a support ticket with further details so that we can investigate the details? It seems like a different issue than the one described here.

phillip-dd avatar Nov 05 '20 23:11 phillip-dd

I experience the same problem with DD Azure Integration on subsequent terraform apply commands. I had to manually delete Datadog integration for a successful terraform apply.

Error: error creating an Azure integration: 409 Conflict: {"errors": ["The given tenant and client already exists in your Datadog account."]}

ozgurozkan123 avatar Apr 21 '21 12:04 ozgurozkan123

Whenever a change triggers the integration to be deleted and recreated, this happens. If I rerun the terraform apply directly after, it works fine. It seems to happen directly after an integration is removed and reattached:

Error: error deleting an AWS integration Lambda ARN from https://api.datadoghq.eu/api/v1/integration/aws/logs: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.eu","twitter":"http://twitter.com/datadogops","email":"[email protected]"}
Error: error attaching Lambda ARN to AWS integration account from https://api.datadoghq.eu/api/v1/integration/aws/logs: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"http://status.datadoghq.eu","twitter":"http://twitter.com/datadogops","email":"[email protected]"}

As someone said, it seems to be properly deleted/attached, but still answers with 502.

Frogvall avatar Feb 25 '22 15:02 Frogvall

I'm also running into this on a terraform destroy. The resource is deleted, but the API responds with 502.

Error: error disabling Amazon Web Services log collection from https://api.datadoghq.com/api/v1/integration/aws/logs/services: 502 Bad Gateway: {"status":"error","code":502,"errors":["Bad Gateway"],"statuspage":"[http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]](http://status.datadoghq.com%22%2C%22twitter%22:%22http//twitter.com/datadogops%22,%22email%22:%[email protected])"}

oblogic7 avatar May 02 '22 16:05 oblogic7