terraform-provider-google
terraform-provider-google copied to clipboard
P1 total data loss bug in google_storage_bucket lifecycle rule for days_since_noncurrent_time
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
- Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
- If an issue is assigned to a user, that user is claiming responsibility for the issue.
- Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.
Terraform Version
Terraform v1.8.2 google v5.26.0
Affected Resource(s)
google_storage_bucket
Terraform Configuration
resource "google_storage_bucket" "somename-test" {
name = "somename-test"
project = projectid
location = "US-CENTRAL1"
uniform_bucket_level_access = true
versioning {
enabled = true
}
lifecycle_rule {
action {
type = "Delete"
}
condition {
days_since_noncurrent_time = 0
}
}
}
Debug Output
No response
Expected Behavior
This should create a lifecyle rule to delete noncurrent objects after 0 days
Actual Behavior
Creates a lifecycle rule that deletes ALL object in the bucket after 0 days, causing total data loss in the bucket
Steps to reproduce
-
terraform apply
Important Factoids
This is a P1 issue causing total data loss
References
No response
b/339089840
Hi @philip-harvey!
According to your configuration and the description on the official documentation of the property lifecycle.rule[].condition.daysSinceNoncurrentTime
if you set this value to 0 then the normal is that all resources in your bucket will be deleted after 0 days. Because all the resources are going to be recognized as noncurrent
immediately while the value of the condition is equal to 0. Then there is no bug, this is a normal harassment.
Hi @philip-harvey!
According to your configuration and the description on the official documentation of the property
lifecycle.rule[].condition.daysSinceNoncurrentTime
if you set this value to 0 then the normal is that all resources in your bucket will be deleted after 0 days. Because all the resources are going to be recognized asnoncurrent
immediately while the value of the condition is equal to 0. Then there is no bug, this is a normal harassment.
I'm not following the logic there, if you set this via the console it correctly sets the rule to delete objects 0 days after they become noncurrent, which is correct and expected, but if you set this via Terraform it makes a totally different rule, that just deletes ALL objects in the bucket, both current and noncurrent objects. The documentation doesn't call out this behavior anywhere that I can find, and it's inconsistent with how this works in the GUI, and causes total data loss.
This could be a documentation update of the description of this field (lifecycle.rule[].condition.daysSinceNoncurrentTime
) in the API.
This could be a documentation update of the description of this field (
lifecycle.rule[].condition.daysSinceNoncurrentTime
) in the API.
I'm still not following this, the documentation says ". This condition is satisfied when an object has been noncurrent for more than the specified number of days.". The bug is that Terraform creates a lifecycle rule that does not do this, it creates a lifecycle rule that deletes ALL objects, not just noncurrent ones. As I said previously, this works correctly outside of Terraform and it's a very severe bug that causes total data loss.
This is not a documentation issue, this is a bug, a very major one.
I think, This issue is coming because of the provider bug that is already solved: https://github.com/hashicorp/terraform-provider-google/issues/14044 . You can try setting no_age = true in the lifecycle config block to avoid creating unexpected age rule. This is the discussion link of the solution approach:https://github.com/GoogleCloudPlatform/magic-modules/pull/9512. The problem was with terraform's way of handling zero value of the data types. That's why the solution is implemented in this way.
https://github.com/hashicorp/terraform-provider-google/issues/14044 looks like a similar bug, but this bug is not yet fixed in the provider.
Actually it looks like https://github.com/hashicorp/terraform-provider-google/issues/14044 is also not fixed
It is! You can try putting no_age = true in the lifecycle rule condition block, It won't add unexpected condition of age
.
Because all the resources are going to be recognized as noncurrent immediately while the value of the condition is equal to 0.
This shouldn't be the case- objects are determined to be noncurrent if they are replaced by a newer version, per https://cloud.google.com/storage/docs/object-versioning#intro. Live objects should never be replaced by this rule (at least, reading this as a caller of the API).
I suspect this is a side effect of https://github.com/hashicorp/terraform-provider-google/issues/14044 w/o no_age
set. Since Terraform things age
has a value of zero, we're sending that, and there are two values in the rule. That still shouldn't delete current objects as all the rules won't apply- current objects are not noncurrent, and it can't be zero days since they became noncurrent.
Our effective rule as a result is:
lifecycle_rule {
action {
type = "Delete"
}
condition {
age = 0
days_since_noncurrent_time = 0
}
}
@philip-harvey do you have the REST response from a Get call on the bucket, specifically the lifecycle
block? That's available in the provider's debug logs or you can use the API explorer sidebar at https://cloud.google.com/storage/docs/json_api/v1/buckets/get (or the dozen other ways to call the APIs). Terraform state will also probably be correct, but its inability to handle nil values can make it more confusing than just REST in these zero-versus-nil cases.
{
"kind": "storage#bucket",
"selfLink": "https://www.googleapis.com/storage/v1/b/redacted",
"id": "redacted",
"name": "redacted",
"projectNumber": "redacted",
"metageneration": "6",
"location": "US-CENTRAL1",
"storageClass": "STANDARD",
"etag": "CAY=",
"timeCreated": "2024-04-30T15:41:26.753Z",
"updated": "2024-04-30T15:55:46.810Z",
"versioning": {
"enabled": true
},
"lifecycle": {
"rule": [
{
"action": {
"type": "Delete"
},
"condition": {
"age": 0
}
}
]
},
"softDeletePolicy": {
"retentionDurationSeconds": "604800",
"effectiveTime": "2024-04-30T15:41:26.753Z"
},
"iamConfiguration": {
"bucketPolicyOnly": {
"enabled": true,
"lockedTime": "2024-07-29T15:41:26.753Z"
},
"uniformBucketLevelAccess": {
"enabled": true,
"lockedTime": "2024-07-29T15:41:26.753Z"
},
"publicAccessPrevention": "inherited"
},
"locationType": "region"
}
Thanks! That's much worse, we're not sending days_since_noncurrent_time
at all when it has a value of zero. Combined with age
which gets sent when it has a value of zero, it leads to this result where all objects get deleted.
(fyi: I edited your comment to add a code fence, to make it easier to read)
Thanks! That's much worse, we're not sending
days_since_noncurrent_time
at all when it has a value of zero. Combined withage
which gets sent when it has a value of zero, it leads to this result where all objects get deleted.(fyi: I edited your comment to add a code fence, to make it easier to read)
I wondered if you send both age = 0 and days_since_noncurrent_time = 0 if the API just ignores the days_since_noncurrent_time = 0 part since it's redundant, but I haven't verified this.
That works fine:
"lifecycle": {
"rule": [
{
"action": {
"type": "Delete"
},
"condition": {
"age": 0,
"daysSinceNoncurrentTime": 0
}
}
]
},
@philip-harvey we're shipping a docs change immediately while we figure out the longer-term fix here (@NickElliot and @kautikdk are discussing offline). Mind giving https://github.com/GoogleCloudPlatform/magic-modules/pull/10626 a once-over to see if you think it would have helped in your case / make any suggestions if not?
This issue really isn't fixed, this is a work around for the issue. The age should default to null instead of 0, it's still very likely people will accidently wipe their buckets because of this bug.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.