tofu-controller Termination grace period is not respected when the pod is killed.

Hi Team, We started to notice weird behaviour where many locks were not released when the tf-runners were killed, even when the graceful shutdown was set to 1 hour. After some investigation, we noticed that when the pod was restarted, the SIGKILL forcefully killed the Terraform process; therefore, it doesn’t start a graceful shutdown, which sometimes ends with the state lock not being released.

We found these related issues: https://github.com/hashicorp/terraform-exec/pull/332 https://github.com/hashicorp/terraform-exec/pull/334

logs:

2022-11-29T21:42:10.266800268Z		Error message: workspace already locked (lock ID:
2022-11-29T21:42:10.266795895Z		
2022-11-29T21:42:10.266789968Z		Error: Error acquiring the state lock
2022-11-29T21:42:10.266760286Z		
2022-11-29T21:42:07.606685256Z	info	creating a plan
2022-11-29T21:42:07.587542369Z	info	mapping the Spec.VarsFrom
2022-11-29T21:42:07.587537646Z	info	mapping the Spec.Vars
2022-11-29T21:42:07.587512719Z	info	setting up the input variables
2022-11-29T21:42:07.58182462Z		Terraform has been successfully initialized!
2022-11-29T21:42:07.581792524Z		
2022-11-29T21:42:07.581420214Z		you run "terraform init" in the future.
2022-11-29T21:42:07.581416395Z		so that Terraform can guarantee to make the same selections by default when
2022-11-29T21:42:07.58141258Z		selections it made above. Include this file in your version control repository
2022-11-29T21:42:07.581408747Z		Terraform has created a lock file .terraform.lock.hcl to record the provider
2022-11-29T21:42:07.581401961Z		
2022-11-29T21:42:00.820815414Z		Initializing provider plugins...
2022-11-29T21:42:00.820797552Z		
2022-11-29T21:42:00.481037098Z		use this backend unless the backend configuration changes.
2022-11-29T21:42:00.481032663Z		Successfully configured the backend "remote"! Terraform will automatically
2022-11-29T21:42:00.48101017Z		
2022-11-29T21:42:00.198431966Z		Initializing the backend...
2022-11-29T21:41:58.413545513Z		Upgrading modules...
2022-11-29T21:41:58.386167971Z		}
2022-11-29T21:41:58.38616475Z		  "terraform_outdated": true
2022-11-29T21:41:58.386161487Z		  "provider_selections": {},
2022-11-29T21:41:58.386157733Z		  "platform": "linux_amd64",
2022-11-29T21:41:58.386153382Z		  "terraform_version": "1.1.9",
2022-11-29T21:41:58.386133029Z		{
2022-11-29T21:41:58.251028908Z	info	mapping the Spec.BackendConfigsFrom
2022-11-29T21:41:58.249567118Z	info	initializing
2022-11-29T21:41:58.245264737Z	info	getting envvars from os environments
2022-11-29T21:41:58.24525401Z	info	setting envvars
2022-11-29T21:41:58.241270762Z	info	creating new terraform
2022-11-29T21:41:58.237230706Z	info	looking for path
2022-11-29T21:41:58.229338345Z	info	processing configuration
2022-11-29T21:41:58.225878148Z	info	write config to file
2022-11-29T21:41:58.225855343Z	info	write backend config
2022-11-29T21:41:58.223764885Z	info	preparing for Upload and Extraction
2022-11-29T21:36:25.886572001Z	info	cleanup TmpDir

2022-11-29T21:36:25.88396556Z	error	error creating the plan
2022-11-29T21:36:16.127120371Z	info	creating a plan

events:

2022-11-29 21:36:26	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Stopping container tf-runner
2022-11-29 21:36:26	Warning	Terraform	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d	error running Plan: rpc error: code = Internal desc = signal: killed
2022-11-29 21:36:25	Warning	Terraform	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d	Plan error: rpc error: code = Internal desc = signal: killed
2022-11-29 21:35:54	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Started container tf-runner
2022-11-29 21:35:53	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Created container tf-runner
2022-11-29 21:35:53	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Container image "ghcr.io/weaveworks/tf-runner:v0.13.0-rc.1" already present on machine
2022-11-29 21:35:51	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Successfully assigned terraform-runners/env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner
2022-11-29 21:35:46	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Successfully assigned terraform-runners/env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner
2022-11-29 21:35:46	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Stopping container tf-runner
2022-11-29 21:35:46	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Started container tf-runner
2022-11-29 21:35:46	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Created container tf-runner
2022-11-29 21:35:46	Normal	Pod	env-76461b07-9216-4d60-8a43-c5a30ddfdd8d-tf-runner	Container image "ghcr.io/weaveworks/tf-runner:v0.13.0-rc.1" already present on machine

Is there something we can do to avoid this behaviour without setting force unlock to auto?

Thanks

Dec 14 '22 12:12 TarasLykhenko

@Nalum could you please give some advice here?

Dec 14 '22 12:12 chanwit

BTW @TarasLykhenko could you please update tf-controller to v0.13.1? the v0.13.0.rc.1 has many known issues fixed by the more recent version.

Dec 14 '22 12:12 chanwit

Unfortunately I'm not sure there is another way to handle this without dealing with the locked state, at least until hashicorp/terraform-exec#334 comes into play.

You don't have to use the auto value but you will need to deal with the locked state either with auto or manually by setting the required fields:

spec:
  tfstate:
    forceUnlock: "yes"
    lockIdentifier: [lock-id-here]

Dec 14 '22 14:12 nalum

FYI check out my workaround from https://github.com/weaveworks/tf-controller/issues/531#issuecomment-1452292954

Mar 02 '23 17:03 artem-nefedov

This is from a write-up I did analysing the problem in some depth.

When Kubernetes terminates a pod the init process is sent SIGTERM.

tf-controller uses tini https://github.com/weaveworks/tf-controller/blob/29c37473906d2d664e25db86f76eb5832ec5c7ee/runner-base.Dockerfile#L66

ENTRYPOINT [ "/sbin/tini", "--", "tf-runner" ]

tini handles receiving signals, by signalling its child process (in this case the tf-runner).

/* There is a signal to handle here */
switch (sig.si_signo) {
case SIGCHLD:
	/* Special-cased, as we don't forward SIGCHLD. Instead, we'll
	 * fallthrough to reaping processes.
	 */
	PRINT_DEBUG("Received SIGCHLD");
	break;
default:
	PRINT_DEBUG("Passing signal: '%s'", strsignal(sig.si_signo));
	/* Forward anything else */
	if (kill(kill_process_group ? -child_pid : child_pid, sig.si_signo)) {
		if (errno == ESRCH) {
			PRINT_WARNING("Child was dead when forwarding signal");
		} else {
			PRINT_FATAL("Unexpected error when forwarding signal: '%s'", strerror(errno));
			return 1;
		}
	}
	break;

Note that tini can be configured to signal an entire process group, but see later for why this doesn't impact on the terraform process.

tf-runner sets itself up to handle signals, when the signal is received, the signal is placed on to a channel.

// catch the SIGTERM from the kubelet to gracefully terminate
sigterm := make(chan os.Signal, 1)
signal.Notify(sigterm, syscall.SIGTERM)
defer func() {
	signal.Stop(sigterm)
}()
```	

This channel is passed through to the [TerraformRunnerServer](https://github.com/weaveworks/tf-controller/blob/main/mtls/runner_serve.go#L47).

Within the `Apply` method, there is code that [cancels a context upon receiving the signal](https://github.com/weaveworks/tf-controller/blob/29c37473906d2d664e25db86f76eb5832ec5c7ee/runner/server.go#L356-L357).

```go
ctx, cancel := context.WithCancel(ctx)
go func() {
	select {
	case <-r.Done:
		cancel()
	case <-ctx.Done():
	}
}()

This cancellation is significant, as the assumption here is that the tf-runner process is signalled with SIGTERM and this is resulting in the cancellation of the context.

This context is passed to github.com/hashicorp/terraform-exec through to tf-exec code which delegates executing the generated command-line to Go's exec.Command.

There are several implementations of runTerraformCmd but the Linux version is what applies here.

This code does this:

	cmd.SysProcAttr = &syscall.SysProcAttr{
		// kill children if parent is dead
		Pdeathsig: syscall.SIGKILL,
		// set process group ID
		Setpgid: true,
	}

This does two different things, Setpgid: true says that the child process should be put into a separate process group, this means that signals sent to the parent process (tf-runner) will never be automatically propagated to the child process (terraform), this is why even configuring tini to send signals to the process-group would not fix the problem we're seeing.

The other thing it does, Pdeathsig: syscall.SIGKILL is crucial because this impacts on how process termination occurs on Linux.

The runTerraformCmd code starts the child process and then waits for it to complete.

The current assumption is that the terraform process sets up similar signal handling to tf-runner and handles SIGTERM by unlocking the lock.

While the terraform process is executing, tf-runner is receiving a SIGTERM, and so it closes the context, this results in SIGKILL being sent to terraform.

This is bad!

Let's go back to the Pdeathsig configuration.

This sets an attribute on the spawned process (only on Linux) that defines the signal to be sent when the thread that started the process is terminated.

This is Go, we don't do threads...except that Goroutines are multiplexed on top of a set of threads that are managed by the Go runtime, the runtime can create and destroy threads as necessary, this describes the process really well.

So, at any point in time, the thread that starts the terraform process could be terminated which would result (because of Pdeathsig) in the spawned process being SIGKILLed.

https://github.com/golang/go/issues/27505 recommends locking the thread around this which would presumably prevent it being killed off by the runtime.

We know that switching from SIGKILL to SIGTERM as the Pdeathsig has resulted in a complete drop-off in fail-to-unlock issues, which implies that this is due to the thread dying (as to why, it could be because the parent process terminating).

Nov 21 '23 14:11 bigkevmcd

We know that switching from SIGKILL to SIGTERM as the Pdeathsig has resulted in a complete drop-off in fail-to-unlock issues, which implies that this is due to the thread dying (as to why, it could be because the parent process terminating).

Additionally: since sending SIGTERM to the tf-runner process would result in the context being cancelled, and process.Kill() being called, there's at least a strong suggestion that SIGTERM is not being used. If it were, then changing Pdeathsig would not make any difference, because the terraform processes would be killed by process.Kill() both before and after the change.

Nov 21 '23 14:11 squaremo

Kubelet will use SIGKILL without SIGTERM and a grace period, in some circumstances: https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

Nov 21 '23 15:11 squaremo

Regarding the kubelet and SIGKILL without SIGTERM, I actually observed this during troubleshooting where a group of tf-runner pods were evicted due to node pressure caused by disk(imagefs.available.) The commonality was that all of the tf-runner pods that were scheduled on that node with disk pressure resulted in state locking issues on subsequent runs.

Nov 28 '23 18:11 cliffordt3

Looks like https://github.com/hashicorp/terraform-exec/pull/512 can resolve this issue, and tfexec uses a MPL-2.0 license so it's fine to upgrade to it.

Only problem is that it introduces a set of breaking changes in https://github.com/hashicorp/terraform-exec/releases/tag/v0.18.0 which is the reason it's pinned in the go.mod file. I haven't looked into the consequences of removing the specific error handling from tofu-controller. So it's a remaining task before being able to resolve this issue.

Aug 22 '25 09:08 akselleirv