terraform S3 bucket rewind error when running apply

Terraform Version

Terraform v1.6.5
on darwin_arm64

Terraform Configuration Files

// providers.tf

terraform {
  required_providers {
    ...
  }
  required_version = "~> 1.6.5"

  backend "s3" {
    region  = "eu-west-1"
    bucket  = "..."
    key     = "terraform.tfstate"
    encrypt = true
  }
}

Debug Output

I can't reproduce the issue again so can't get a debug output, but here is the default output logs we saw:

2024-01-16T07:15:31.2911852Z ##[group]Run terraform apply -no-color -auto-approve -input=false
2024-01-16T07:15:31.2912737Z [36;1mterraform apply -no-color -auto-approve -input=false[0m
2024-01-16T07:15:31.2956434Z shell: /usr/bin/bash -e {0}
2024-01-16T07:15:31.2957094Z env:
2024-01-16T07:15:31.2958080Z   TF_AWS_ROLE: arn:aws:iam::***:role/***
2024-01-16T07:15:31.2958921Z   AWS_DEFAULT_REGION: eu-west-1
2024-01-16T07:15:31.2959498Z   AWS_REGION: eu-west-1
2024-01-16T07:15:31.2960162Z   AWS_ACCESS_KEY_ID: ***
2024-01-16T07:15:31.2960951Z   AWS_SECRET_ACCESS_KEY: ***
2024-01-16T07:15:31.2970497Z   AWS_SESSION_TOKEN: ***
2024-01-16T07:15:31.2971311Z   TERRAFORM_CLI_PATH: /home/runner/work/_temp/8093aa56-402a-471c-af37-829ad1f1dc96
2024-01-16T07:15:31.2972191Z ##[endgroup]
2024-01-16T07:15:39.7033998Z module.*****: Refreshing state... [id=14821739]
2024-01-16T07:15:39.7035304Z 
2024-01-16T07:15:39.7035795Z Error: Failed to save state
2024-01-16T07:15:39.7036852Z 
2024-01-16T07:15:39.7037415Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T07:15:39.7038426Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T07:15:39.7039030Z 
2024-01-16T07:15:39.7039240Z Error: Failed to persist state to backend
2024-01-16T07:15:39.7039628Z 
2024-01-16T07:15:39.7040057Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T07:15:39.7041088Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T07:15:39.7042033Z to the file "errored.tfstate" in the current working directory.
2024-01-16T07:15:39.7042577Z 
2024-01-16T07:15:39.7043108Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T07:15:39.7043870Z making it harder to recover.
2024-01-16T07:15:39.7044266Z 
2024-01-16T07:15:39.7045109Z To retry writing this state, use the following command:
2024-01-16T07:15:39.7045794Z     terraform state push errored.tfstate

Expected Behavior

Terraform apply should run succesfully. If an error occurs during state sync I would also expect the run to end immediately to prevent any divergence in state.

Actual Behavior

Terraform apply continued running but appeared to have some bad state causing it to create/delete resources incorrectly.

Steps to Reproduce

terraform init -no-color
terraform apply -no-color -auto-approve -input=false

As I mentioned earlier, we can't reliably re-produce though.

Additional Context

This was running inside a GHA using hashicorp/setup-terraform@v3

We were also using AWS OIDC to assume a role automatically in the runner: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services

References

No response

Jan 16 '24 09:01 DazKins

FWIW, I'm seeing this same issue in v1.6.6 both from laptop and in pipelines - after all the resource changes have happened, terraform fails with

  | │ Error: Failed to save state
  | │
  | │ Error saving state: failed to upload state: operation error S3: PutObject, failed to rewind transport stream for retry, request stream is not seekable

seems to be completely random, always in a tree with multiple workspaces. Nothing fancy in our perms, standard AWS SSO users on laptop and EC2 instance IAM roles in CI.

Having it fail in pipelines is particularly problematic, as our pipelines don't currently save errored.tfstate if it exists - so cleaning up is nasty manual state manipulation.

Jan 17 '24 05:01 simonblake-mp

We see the same issue in v1.6.6. Please see the trace for more insights:

2024-01-16T14:27:51.0906051Z 2024-01-16T14:27:50.918Z [DEBUG] states/remote: state read lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58; lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58
2024-01-16T14:27:51.0907551Z 2024-01-16T14:27:50.934Z [INFO]  backend-s3: Uploading remote state: tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=***
2024-01-16T14:27:51.0918815Z 2024-01-16T14:27:50.941Z [DEBUG] backend-s3: HTTP Request Sent: aws.region=eu-west-2 aws.s3.bucket=*** aws.s3.key=*** rpc.method=PutObject rpc.service=S3 rpc.system=aws-api tf_aws.sdk=aws-sdk-go-v2 tf_aws.signing_region="" tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=*** http.request.header.x_amz_decoded_content_length=129597 http.request.header.authorization="AWS4-HMAC-SHA256 Credential=ASIA************KEUA/20240116/eu-west-2/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;content-encoding;content-length;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-sdk-checksum-algorithm;x-amz-security-token;x-amz-server-side-encryption;x-amz-trailer, Signature=*****" http.request.header.x_amz_date=20240116T142750Z http.request.header.x_amz_content_sha256=STREAMING-UNSIGNED-PAYLOAD-TRAILER http.request.header.x_amz_trailer=x-amz-checksum-sha256 net.peer.name=***.s3.eu-west-2.amazonaws.com http.request_content_length=129679 http.request.body="[Redacted: 126.6 KB (129,679 bytes), Type: application/json]" http.url=https://***.s3.eu-west-2.amazonaws.com/***?x-id=PutObject http.request.header.x_amz_security_token="*****" http.request.header.x_amz_acl=bucket-owner-full-control http.request.header.amz_sdk_request="attempt=1; max=5" http.request.header.accept_encoding=identity http.request.header.x_amz_server_side_encryption=AES256 http.request.header.content_encoding=aws-chunked http.request.header.x_amz_sdk_checksum_algorithm=SHA256 http.method=PUT http.user_agent="APN/1.0 HashiCorp/1.0 Terraform/1.6.6 (+https://www.terraform.io) aws-sdk-go-v2/1.24.0 os/linux lang/go#1.21.5 md/GOOS#linux md/GOARCH#amd64 api/s3#1.47.5 ft/s3-transfer" http.request.header.amz_sdk_invocation_id=1fae9720-5924-45a9-832c-2bce77cca664 http.request.header.content_type=application/json
2024-01-16T14:27:51.5915142Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: reading initial snapshot from errored.tfstate
2024-01-16T14:27:51.5916619Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
2024-01-16T14:27:51.5917812Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: read nil snapshot
2024-01-16T14:27:51.5919595Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: Importing snapshot with lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 as the initial state snapshot at errored.tfstate
2024-01-16T14:27:51.5921167Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: preparing to manage state snapshots at errored.tfstate
2024-01-16T14:27:51.5922242Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: no previously-stored snapshot exists
2024-01-16T14:27:51.5923195Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: state file backups are disabled
2024-01-16T14:27:51.5924429Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: forcing lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 for migration/import
2024-01-16T14:27:51.5925663Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: writing snapshot at errored.tfstate
2024-01-16T14:27:51.5940360Z 
2024-01-16T14:27:51.5940616Z Error: Failed to save state
2024-01-16T14:27:51.5940990Z 
2024-01-16T14:27:51.5941454Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T14:27:51.5942382Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T14:27:51.5942793Z 
2024-01-16T14:27:51.5942939Z Error: Failed to persist state to backend
2024-01-16T14:27:51.5943194Z 
2024-01-16T14:27:51.5943479Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T14:27:51.5944159Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T14:27:51.5944977Z to the file "errored.tfstate" in the current working directory.
2024-01-16T14:27:51.5945328Z 
2024-01-16T14:27:51.5945594Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T14:27:51.5946094Z making it harder to recover.
2024-01-16T14:27:51.5946293Z 
2024-01-16T14:27:51.5946481Z To retry writing this state, use the following command:
2024-01-16T14:27:51.5946923Z     terraform state push errored.tfstate
2024-01-16T14:27:51.5947279Z 
2024-01-16T14:27:51.6009041Z data.aws_iam_policy_document.assume_role_policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010040Z data.aws_iam_roles.github_actions_role: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010940Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6011803Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6012988Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6013994Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6015084Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6016230Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6017237Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6018331Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6019555Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6021455Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6022749Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6023985Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6025019Z module.member-access[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6025868Z module.member-access-eu-central[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6026729Z module.member-access-us-east[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6027463Z data.aws_iam_session_context.whoami: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6028171Z data.aws_organizations_organization.root_account: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6029100Z module.ssm-cross-account-access.aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6030158Z module.instance-scheduler-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6031179Z module.member-access-us-east[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6032185Z module.member-access-eu-central[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6033154Z module.member-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]

NOTE, s3 bucket and tf backend values were further redacted with ***.

Jan 17 '24 11:01 ewastempel

Hi, saw also this error since 1.6.5 in our pipelines. It is a nasty cleanup effort, because we have not the errored.tfstate in the pipeline. The error itselfs seems to come from changed types in aws-go-sdk. It should be possible to reproduce with trying to upload with connection issues.

Jan 18 '24 07:01 DorisEckel8412

Given the timeline, this seems a direct result of https://github.com/hashicorp/terraform/pull/34313. The changes look very innocent so possible an upstream issue.

Jan 22 '24 09:01 Alex-Waring

We are seeing these errors sporadically with version 1.6.6 as well

Jan 26 '24 17:01 scottmchargue-eb

We're also seeing this with Terraform 1.7.1.

Jan 30 '24 09:01 dms1981

For us this started happening from time to time, and even happens when there are no changes to apply:

[...]
bm-mifro-stack No changes. Your infrastructure matches the configuration. 

Terraform has compared your real infrastructure against your configuration and 

found no differences, so no changes are needed. 

bm-mifro-stack
 │ Error: Failed to save state 
 │ Error saving state: failed to upload state: operation error S3: PutObject, 
 │ failed to rewind transport stream for retry, request stream is not seekable 
 │ Error: Failed to persist state to backend 
 │ The error shown above has prevented Terraform from writing the updated state 
 │ to the configured backend. To allow for recovery, the state has been written 
 │ to the file "errored.tfstate" in the current working directory. 
 │ Running "terraform apply" again at this point will create a forked state, 
 │ making it harder to recover. 
 | To retry writing this state, use the following command: 
 │ terraform state push errored.tfstate 
}

0 Stacks deploying 1 Stack done 0 Stacks waiting 

Invoking Terraform CLI failed with exit code 1 

error Command failed with exit code 1. 

Error: Process completed with exit code 1.

Not sure why any state is being saved if no changes are being applied.

But for me a rerun of the failed job usually solves the issue.

Feb 03 '24 11:02 karfau

just updated to terraform 1.6.6 and also seeing the issue.

Feb 07 '24 15:02 sleerssen

As a workaround, we set max_retries on the s3 backend to a value of 1, which appears to resolve the issue for us at the moment. This was based on conversation in this issue: https://github.com/aws/aws-sdk-go-v2/issues/2038

Feb 07 '24 19:02 brenix

The max_retries workaround unfortunately did not work for us (I was in part expecting this)

│ Error: Failed to save state
│ 
│ Error saving state: failed to upload state: operation error S3: PutObject,
│ exceeded maximum number of attempts, 1, https response error StatusCode: 0,
│ RequestID: , HostID: , request send failed, Put
│ "https://<my_bucket>.s3.ap-southeast-2.amazonaws.com/<my_tf_state_key>?x-id=PutObject":
│ write tcp 10.1.0.19:40800->52.95.130.58:443: use of closed network
│ connection

Feb 09 '24 00:02 dmitry-sherlok

I'm seeing this issue in 1.7.3.

Feb 21 '24 09:02 delib-michael

saw this randomy on 1.7.1

Feb 21 '24 20:02 mattthaber

Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?

Mar 04 '24 15:03 petrosmelachrinos

Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?

1.6.4 is where we settled

Mar 04 '24 15:03 sleerssen

Seeing the same issue using the github action: hashicorp/setup-terraform@v2

Mar 04 '24 15:03 MihalisW

Seeing this with 1.6.6.

Mar 04 '24 15:03 muawiakh

I saw this ticket was closed and just upgraded to the latest available version, 1.7.4, to pick up the fix.

... then I got the error again while deploying the upgrade.

Mar 12 '24 16:03 chrisdenton-ct

@chrisdenton-ct 1.7.4 would not have this fix.

The fix was only merged 4 days ago. 2024-03-09.

1.7.4 was released prior to that on 2024-02-21.

Mar 13 '24 01:03 dlaudams

1.7.5 (March 13, 2024) BUG FIXES:

backend/s3: When using s3 backend and encountering a network issue, the retry code would fail with "failed to rewind transport stream for retry". Now the retry should be successful. (https://github.com/hashicorp/terraform/pull/34796)

Mar 14 '24 10:03 alekc

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Apr 14 '24 02:04 github-actions[bot]