terraform-provider-aws icon indicating copy to clipboard operation
terraform-provider-aws copied to clipboard

[Bug]: `aws_dms_replication_task` failed with error InvalidParameterValueException: TimestampColumnName cannot be an empty string.

Open thaiphv opened this issue 2 years ago • 9 comments

Terraform Core Version

1.3.2

AWS Provider Version

4.35.0

Affected Resource(s)

  • aws_dms_replication_task

Expected Behavior

The Terraform provider should create a task successfully.

Actual Behavior

The Terraform provider failed to create the resource and reported the error: InvalidParameterValueException: TimestampColumnName cannot be an empty string.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_dms_endpoint" "source" {
  endpoint_id = "dms-source"

  database_name = var.source_db_name
  endpoint_type = "source"
  engine_name   = "sqlserver"
  username      = var.source_db_username
  password      = var.source_db_password
  server_name   = var.source_db_server_name
  port          = var.source_db_server_port
}

resource "aws_dms_endpoint" "fullload" {
  endpoint_id = "fullload-task"

  endpoint_type = "target"
  engine_name   = "s3"

  s3_settings {
    add_column_name   = true
    bucket_name       = var.bucket
    bucket_folder     = var.s3_folder
    compression_type  = "NONE"
    csv_delimiter     = ","
    csv_row_delimiter = "\\n"
    date_partition_enabled = false
    include_op_for_full_load = true
    rfc_4180                 = false
    service_access_role_arn  = var.dms_s3_iam_role_arn
  }
}

resource "aws_dms_replication_task" "task" {
  replication_task_id = "fullload-and-cdc"

  migration_type            = "full-load-and-cdc"
  replication_instance_arn  = var.dms_replication_instance_arn
  replication_task_settings = var.fullload_cdc_task_settings

  source_endpoint_arn = aws_dms_endpoint.source.endpoint_arn
  target_endpoint_arn = aws_dms_endpoint.fullload.endpoint_arn

  table_mappings = var.mappings
}

Steps to Reproduce

Run terraform apply

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

No response

thaiphv avatar Oct 18 '22 00:10 thaiphv

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

github-actions[bot] avatar Oct 18 '22 00:10 github-actions[bot]

Just realised that the Terraform provider applied a whole lot of default settings to the endpoint:

{
    "ServiceAccessRoleArn": "<arn>",
    "ExternalTableDefinition": "",
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": ",",
    "BucketFolder": "<folder>",
    "BucketName": "<bucket>",
    "CompressionType": "NONE",
    "EncryptionMode": "SSE_S3",
    "ServerSideEncryptionKmsKeyId": "",
    "DataFormat": "csv",
    "EncodingType": "rle-dictionary",
    "DictPageSizeLimit": 1048576,
    "RowGroupLength": 10000,
    "DataPageSize": 1048576,
    "ParquetVersion": "parquet-1-0",
    "EnableStatistics": true,
    "IncludeOpForFullLoad": true,
    "CdcInsertsOnly": false,
    "TimestampColumnName": "",
    "ParquetTimestampInMillisecond": false,
    "CdcInsertsAndUpdates": false,
    "DatePartitionEnabled": false,
    "DatePartitionSequence": "yyyymmdd",
    "DatePartitionDelimiter": "slash",
    "UseCsvNoSupValue": false,
    "CsvNoSupValue": "",
    "PreserveTransactions": false,
    "CdcPath": "",
    "UseTaskStartTimeForFullLoadTimestamp": false,
    "CannedAclForObjects": "none",
    "AddColumnName": false,
    "CdcMaxBatchInterval": 60,
    "CdcMinFileSize": 32,
    "CsvNullValue": "NULL",
    "MaxFileSize": 1048576,
    "Rfc4180": false
}

TimestampColumnName was also set to "" even I didn't set it in the Terraform configuration.

thaiphv avatar Oct 18 '22 00:10 thaiphv

Whereas the setting of an endpoint created by CloudFormation was a lot less verbose:

{
    "ServiceAccessRoleArn": "<arn>",
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": ",",
    "BucketFolder": "<folder>",
    "BucketName": "<bucket>",
    "CompressionType": "NONE",
    "EnableStatistics": true,
    "DatePartitionEnabled": true
}

thaiphv avatar Oct 18 '22 00:10 thaiphv

I had a chat with the AWS support team and was told it looks like an issue with the way the aws_dms_endpoint resource is created. The Terraform provider shouldn't submit settings with default values to the API, particularly the "TimetampColumnName" setting. Even we didn't mean to set it but after TF created the resource, it also set "TimetampColumnName" to an empty string. And when we used it with a aws_dms_replication_task resource, the API complained that the "TimetampColumnName" setting of the endpoint must be non-empty.

thaiphv avatar Oct 18 '22 06:10 thaiphv

I've been using DMS fairly regularly, but hit this same issue today.

I tried defining the "timestamp_column_name" on the endpoint and setting TimestampColumnName=ts; on the endpoints "extra_connection_attributes" but no luck.

Then rolled back to the 4.34.0 provider and re-init'd and I'm getting the same error.. Which would suggest it's a change on AWS side as DMS was working fine for months.

Mistawes avatar Oct 18 '22 10:10 Mistawes

Hmm I noticed, if I change from using an S3 target endpoint to using an oracle (same as source) it created the task fine.

So seems it's related to S3 target endpoints?

Mistawes avatar Oct 18 '22 12:10 Mistawes

Hmm I noticed, if I change from using an S3 target endpoint to using an oracle (same as source) it created the task fine.

So seems it's related to S3 target endpoints?

I think so

thaiphv avatar Oct 19 '22 04:10 thaiphv

I had a chat with the AWS support team and was told it looks like an issue with the way the aws_dms_endpoint resource is created. The Terraform provider shouldn't submit settings with default values to the API, particularly the "TimetampColumnName" setting. Even we didn't mean to set it but after TF created the resource, it also set "TimetampColumnName" to an empty string. And when we used it with a aws_dms_replication_task resource, the API complained that the "TimetampColumnName" setting of the endpoint must be non-empty.

Have you found any work around for it ?

ali-raza-rizvi avatar Nov 01 '22 21:11 ali-raza-rizvi

I had a chat with the AWS support team and was told it looks like an issue with the way the aws_dms_endpoint resource is created. The Terraform provider shouldn't submit settings with default values to the API, particularly the "TimetampColumnName" setting. Even we didn't mean to set it but after TF created the resource, it also set "TimetampColumnName" to an empty string. And when we used it with a aws_dms_replication_task resource, the API complained that the "TimetampColumnName" setting of the endpoint must be non-empty.

Have you found any work around for it ?

Unfortunately, no. I abandoned my effort Terraforming the DMS pipelines.

thaiphv avatar Nov 01 '22 22:11 thaiphv

The trouble is with the way aws_dms_endpoint includes default values. Removing default values is generally considered a breaking change. I don't see this as a regression but rather a challenge with the way we've always done it that is not working well with the way AWS would like it done.

Going forward, we can potentially remove these default values, technically a breaking change, relying on AWS for the defaults, and maybe get away with it. There is risk if someone's configuration has relied on the default values the AWS provider gives versus the values AWS would give. That risk needs to be weighed against the problems this is currently causing with the aws_dms_replication_task. (The aws_dms_s3_endpoint includes fewer default values but still seems to be having some problems with aws_dms_replication_task.)

See also #28130

YakDriver avatar Jan 04 '23 23:01 YakDriver

After looking at this more, this is resolved by using the aws_dms_s3_endpoint resource with the aws_dms_replication_task resource. We apologize for the inconvenience of switching resources but hopefully it is better than waiting for v5 due to the fix requiring breaking changes. Please let us know if you have issues using aws_dms_s3_endpoint to accomplish your task.

YakDriver avatar Jan 05 '23 23:01 YakDriver

This functionality has been released in v4.50.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions[bot] avatar Jan 13 '23 04:01 github-actions[bot]

Still have the same issue

aws_dms_replication_task.this["cdc_ex"]: Creating...
╷
│ Error: error creating DMS Replication Task (adv2-s3-prod): InvalidParameterValueException: TimestampColumnName cannot be an empty string.
│ 	status code: 400, request id: 9cb1b36c-6e39-4a6b-898e-d065a92ca7ce
│
│   with aws_dms_replication_task.this["cdc_ex"],
│   on main.tf line 291, in resource "aws_dms_replication_task" "this":
│  291: resource "aws_dms_replication_task" "this" {
│

vvatlin avatar Jan 15 '23 17:01 vvatlin

@YakDriver aws provider 4.50.0

vvatlin avatar Jan 15 '23 17:01 vvatlin

I also ran into this issue today, but was able to fix it. Using aws_dms_endpoint as a resource, you can add your own string for the timestamp column inside the s3_settings with the timestamp_column_name argument.

Example:

resource "aws_dms_endpoint" "your_s3_endpoint" {
  endpoint_id   = "your_s3_endpoint_id"
  endpoint_type = "target"
  engine_name = "s3"
  s3_settings {
    bucket_name   = "your_bucket_name"
    bucket_folder = "your_bucket_folder"
    service_access_role_arn = "iam_role_arn"
    timestamp_column_name = "your_timestamp_column"
  }

DaanVandenreyt avatar Jan 26 '23 12:01 DaanVandenreyt

Using aws_dms_s3_endpoint solved the error here :) I created a new module specific for S3.

alexlopes avatar Jan 31 '23 12:01 alexlopes

I saw and tried that resource as well, but had difficulties with its outputs. For some reason, when using aws_dms_s3_endpoint the endpoint_arn attribute wouldn't work for me. So that is why I used the regular one.

DaanVandenreyt avatar Jan 31 '23 13:01 DaanVandenreyt

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Mar 03 '23 02:03 github-actions[bot]