terraform
terraform copied to clipboard
S3 bucket rewind error when running apply
Terraform Version
Terraform v1.6.5
on darwin_arm64
Terraform Configuration Files
// providers.tf
terraform {
required_providers {
...
}
required_version = "~> 1.6.5"
backend "s3" {
region = "eu-west-1"
bucket = "..."
key = "terraform.tfstate"
encrypt = true
}
}
Debug Output
I can't reproduce the issue again so can't get a debug output, but here is the default output logs we saw:
2024-01-16T07:15:31.2911852Z ##[group]Run terraform apply -no-color -auto-approve -input=false
2024-01-16T07:15:31.2912737Z [36;1mterraform apply -no-color -auto-approve -input=false[0m
2024-01-16T07:15:31.2956434Z shell: /usr/bin/bash -e {0}
2024-01-16T07:15:31.2957094Z env:
2024-01-16T07:15:31.2958080Z TF_AWS_ROLE: arn:aws:iam::***:role/***
2024-01-16T07:15:31.2958921Z AWS_DEFAULT_REGION: eu-west-1
2024-01-16T07:15:31.2959498Z AWS_REGION: eu-west-1
2024-01-16T07:15:31.2960162Z AWS_ACCESS_KEY_ID: ***
2024-01-16T07:15:31.2960951Z AWS_SECRET_ACCESS_KEY: ***
2024-01-16T07:15:31.2970497Z AWS_SESSION_TOKEN: ***
2024-01-16T07:15:31.2971311Z TERRAFORM_CLI_PATH: /home/runner/work/_temp/8093aa56-402a-471c-af37-829ad1f1dc96
2024-01-16T07:15:31.2972191Z ##[endgroup]
2024-01-16T07:15:39.7033998Z module.*****: Refreshing state... [id=14821739]
2024-01-16T07:15:39.7035304Z
2024-01-16T07:15:39.7035795Z Error: Failed to save state
2024-01-16T07:15:39.7036852Z
2024-01-16T07:15:39.7037415Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T07:15:39.7038426Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T07:15:39.7039030Z
2024-01-16T07:15:39.7039240Z Error: Failed to persist state to backend
2024-01-16T07:15:39.7039628Z
2024-01-16T07:15:39.7040057Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T07:15:39.7041088Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T07:15:39.7042033Z to the file "errored.tfstate" in the current working directory.
2024-01-16T07:15:39.7042577Z
2024-01-16T07:15:39.7043108Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T07:15:39.7043870Z making it harder to recover.
2024-01-16T07:15:39.7044266Z
2024-01-16T07:15:39.7045109Z To retry writing this state, use the following command:
2024-01-16T07:15:39.7045794Z terraform state push errored.tfstate
Expected Behavior
Terraform apply should run succesfully. If an error occurs during state sync I would also expect the run to end immediately to prevent any divergence in state.
Actual Behavior
Terraform apply continued running but appeared to have some bad state causing it to create/delete resources incorrectly.
Steps to Reproduce
terraform init -no-colorterraform apply -no-color -auto-approve -input=false
As I mentioned earlier, we can't reliably re-produce though.
Additional Context
This was running inside a GHA using hashicorp/setup-terraform@v3
We were also using AWS OIDC to assume a role automatically in the runner: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services
References
No response
FWIW, I'm seeing this same issue in v1.6.6 both from laptop and in pipelines - after all the resource changes have happened, terraform fails with
| │ Error: Failed to save state
| │
| │ Error saving state: failed to upload state: operation error S3: PutObject, failed to rewind transport stream for retry, request stream is not seekable
seems to be completely random, always in a tree with multiple workspaces. Nothing fancy in our perms, standard AWS SSO users on laptop and EC2 instance IAM roles in CI.
Having it fail in pipelines is particularly problematic, as our pipelines don't currently save errored.tfstate if it exists - so cleaning up is nasty manual state manipulation.
We see the same issue in v1.6.6. Please see the trace for more insights:
2024-01-16T14:27:51.0906051Z 2024-01-16T14:27:50.918Z [DEBUG] states/remote: state read lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58; lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58
2024-01-16T14:27:51.0907551Z 2024-01-16T14:27:50.934Z [INFO] backend-s3: Uploading remote state: tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=***
2024-01-16T14:27:51.0918815Z 2024-01-16T14:27:50.941Z [DEBUG] backend-s3: HTTP Request Sent: aws.region=eu-west-2 aws.s3.bucket=*** aws.s3.key=*** rpc.method=PutObject rpc.service=S3 rpc.system=aws-api tf_aws.sdk=aws-sdk-go-v2 tf_aws.signing_region="" tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=*** http.request.header.x_amz_decoded_content_length=129597 http.request.header.authorization="AWS4-HMAC-SHA256 Credential=ASIA************KEUA/20240116/eu-west-2/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;content-encoding;content-length;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-sdk-checksum-algorithm;x-amz-security-token;x-amz-server-side-encryption;x-amz-trailer, Signature=*****" http.request.header.x_amz_date=20240116T142750Z http.request.header.x_amz_content_sha256=STREAMING-UNSIGNED-PAYLOAD-TRAILER http.request.header.x_amz_trailer=x-amz-checksum-sha256 net.peer.name=***.s3.eu-west-2.amazonaws.com http.request_content_length=129679 http.request.body="[Redacted: 126.6 KB (129,679 bytes), Type: application/json]" http.url=https://***.s3.eu-west-2.amazonaws.com/***?x-id=PutObject http.request.header.x_amz_security_token="*****" http.request.header.x_amz_acl=bucket-owner-full-control http.request.header.amz_sdk_request="attempt=1; max=5" http.request.header.accept_encoding=identity http.request.header.x_amz_server_side_encryption=AES256 http.request.header.content_encoding=aws-chunked http.request.header.x_amz_sdk_checksum_algorithm=SHA256 http.method=PUT http.user_agent="APN/1.0 HashiCorp/1.0 Terraform/1.6.6 (+https://www.terraform.io) aws-sdk-go-v2/1.24.0 os/linux lang/go#1.21.5 md/GOOS#linux md/GOARCH#amd64 api/s3#1.47.5 ft/s3-transfer" http.request.header.amz_sdk_invocation_id=1fae9720-5924-45a9-832c-2bce77cca664 http.request.header.content_type=application/json
2024-01-16T14:27:51.5915142Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: reading initial snapshot from errored.tfstate
2024-01-16T14:27:51.5916619Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
2024-01-16T14:27:51.5917812Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: read nil snapshot
2024-01-16T14:27:51.5919595Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: Importing snapshot with lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 as the initial state snapshot at errored.tfstate
2024-01-16T14:27:51.5921167Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: preparing to manage state snapshots at errored.tfstate
2024-01-16T14:27:51.5922242Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: no previously-stored snapshot exists
2024-01-16T14:27:51.5923195Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: state file backups are disabled
2024-01-16T14:27:51.5924429Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: forcing lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 for migration/import
2024-01-16T14:27:51.5925663Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: writing snapshot at errored.tfstate
2024-01-16T14:27:51.5940360Z
2024-01-16T14:27:51.5940616Z Error: Failed to save state
2024-01-16T14:27:51.5940990Z
2024-01-16T14:27:51.5941454Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T14:27:51.5942382Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T14:27:51.5942793Z
2024-01-16T14:27:51.5942939Z Error: Failed to persist state to backend
2024-01-16T14:27:51.5943194Z
2024-01-16T14:27:51.5943479Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T14:27:51.5944159Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T14:27:51.5944977Z to the file "errored.tfstate" in the current working directory.
2024-01-16T14:27:51.5945328Z
2024-01-16T14:27:51.5945594Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T14:27:51.5946094Z making it harder to recover.
2024-01-16T14:27:51.5946293Z
2024-01-16T14:27:51.5946481Z To retry writing this state, use the following command:
2024-01-16T14:27:51.5946923Z terraform state push errored.tfstate
2024-01-16T14:27:51.5947279Z
2024-01-16T14:27:51.6009041Z data.aws_iam_policy_document.assume_role_policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010040Z data.aws_iam_roles.github_actions_role: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010940Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6011803Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6012988Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6013994Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6015084Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6016230Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6017237Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6018331Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6019555Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6021455Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6022749Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6023985Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6025019Z module.member-access[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6025868Z module.member-access-eu-central[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6026729Z module.member-access-us-east[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6027463Z data.aws_iam_session_context.whoami: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6028171Z data.aws_organizations_organization.root_account: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6029100Z module.ssm-cross-account-access.aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6030158Z module.instance-scheduler-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6031179Z module.member-access-us-east[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6032185Z module.member-access-eu-central[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6033154Z module.member-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
NOTE, s3 bucket and tf backend values were further redacted with ***.
Hi, saw also this error since 1.6.5 in our pipelines. It is a nasty cleanup effort, because we have not the errored.tfstate in the pipeline. The error itselfs seems to come from changed types in aws-go-sdk. It should be possible to reproduce with trying to upload with connection issues.
Given the timeline, this seems a direct result of https://github.com/hashicorp/terraform/pull/34313. The changes look very innocent so possible an upstream issue.
We are seeing these errors sporadically with version 1.6.6 as well
We're also seeing this with Terraform 1.7.1.
For us this started happening from time to time, and even happens when there are no changes to apply:
[...]
bm-mifro-stack No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.
bm-mifro-stack
│ Error: Failed to save state
│ Error saving state: failed to upload state: operation error S3: PutObject,
│ failed to rewind transport stream for retry, request stream is not seekable
│ Error: Failed to persist state to backend
│ The error shown above has prevented Terraform from writing the updated state
│ to the configured backend. To allow for recovery, the state has been written
│ to the file "errored.tfstate" in the current working directory.
│ Running "terraform apply" again at this point will create a forked state,
│ making it harder to recover.
| To retry writing this state, use the following command:
│ terraform state push errored.tfstate
}
0 Stacks deploying 1 Stack done 0 Stacks waiting
Invoking Terraform CLI failed with exit code 1
error Command failed with exit code 1.
Error: Process completed with exit code 1.
Not sure why any state is being saved if no changes are being applied.
But for me a rerun of the failed job usually solves the issue.
just updated to terraform 1.6.6 and also seeing the issue.
As a workaround, we set max_retries on the s3 backend to a value of 1, which appears to resolve the issue for us at the moment. This was based on conversation in this issue: https://github.com/aws/aws-sdk-go-v2/issues/2038
The max_retries workaround unfortunately did not work for us (I was in part expecting this)
│ Error: Failed to save state
│
│ Error saving state: failed to upload state: operation error S3: PutObject,
│ exceeded maximum number of attempts, 1, https response error StatusCode: 0,
│ RequestID: , HostID: , request send failed, Put
│ "https://<my_bucket>.s3.ap-southeast-2.amazonaws.com/<my_tf_state_key>?x-id=PutObject":
│ write tcp 10.1.0.19:40800->52.95.130.58:443: use of closed network
│ connection
I'm seeing this issue in 1.7.3.
saw this randomy on 1.7.1
Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?
Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?
1.6.4 is where we settled
Seeing the same issue using the github action: hashicorp/setup-terraform@v2
Seeing this with 1.6.6.
I saw this ticket was closed and just upgraded to the latest available version, 1.7.4, to pick up the fix.
... then I got the error again while deploying the upgrade.
@chrisdenton-ct 1.7.4 would not have this fix.
The fix was only merged 4 days ago. 2024-03-09.
1.7.4 was released prior to that on 2024-02-21.
1.7.5 (March 13, 2024) BUG FIXES:
backend/s3: When using s3 backend and encountering a network issue, the retry code would fail with "failed to rewind transport stream for retry". Now the retry should be successful. (https://github.com/hashicorp/terraform/pull/34796)
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.