consul-terraform-sync
consul-terraform-sync copied to clipboard
Error acquiring the state lock when executing task when backend set as consul
Describe the bug
When not setting the backend manually in driver "terraform" block, it sets it as consul. When trying to execute the task, Terraform is trying to acquire the state lock but it is failing with the following error:
2021-12-16T16:01:29.834+0530 [INFO] ctrl: driver initialized 2021-12-16T16:01:29.834+0530 [INFO] ctrl: executing all tasks once through 2021-12-16T16:01:29.835+0530 [DEBUG] ctrl: watching dependencies: dependency_size=2 2021-12-16T16:01:30.057+0530 [DEBUG] driver.terraform: change detected for task: task_name=web 2021-12-16T16:01:30.081+0530 [INFO] ctrl: executing task: task_name=web 2021-12-16T16:01:35.681+0530 [ERROR] cli: error running controller in Once mode: error= | could not apply changes for task web: error tf-apply for 'web': exit status 1 | | Error: Error loading state: failed to lock state in Consul: |
On setting the backend as local, the execution works as expected.
When trying to run the module directly instead of through consul terraform sync, it works with the backend set as consul.
Versions
Consul Terraform Sync
consul-terraform-sync v0.4.2 (bc2b2a0)
Compatible with Terraform >= 0.13.0, < 1.1.0
Consul Version
Consul 1.10.4
Terraform Version
Terraform v1.0.11
Configuration File(s)
using CTS to monitor 2 services in consul and call a module to create dynamic object mapping on Cisco FMC
log_level = "DEBUG"
consul {
address = <redacted>
}
driver "terraform" {
version = "1.0.11"
required_providers {
fmc = {
source = "CiscoDevNet/fmc"
version = "0.2.1"
}
}
}
terraform_provider "fmc" {
fmc_username = <redacted>
fmc_password = <redacted>
fmc_host = <redacted>
fmc_insecure_skip_verify = true
}
task {
name = "web"
description = "update policies based on node availability"
source = "home/user/terraform-fmc-dynamicobject"
providers = ["fmc"]
services = ["web","api"]
}
Terraform Configuration Files Generated by Consul-Terraform-Sync
Click to toggle contents of main.tf
# This file is generated by Consul Terraform Sync.
#
# The HCL blocks, arguments, variables, and values are derived from the
# operator configuration for Sync. Any manual changes to this file
# may not be preserved and could be overwritten by a subsequent update.
#
# Task: web
# Description: update policies based on node availability
terraform {
required_version = ">= 0.13.0, < 1.1.0"
required_providers {
fmc = {
source = "CiscoDevNet/fmc"
version = "0.2.1"
}
}
backend "consul" {
address = <redacted>
gzip = true
path = "consul-terraform-sync/terraform"
}
}
provider "fmc" {
fmc_host = var.fmc.fmc_host
fmc_insecure_skip_verify = var.fmc.fmc_insecure_skip_verify
fmc_password = var.fmc.fmc_password
fmc_username = var.fmc.fmc_username
}
# update policies based on node availability
module "web" {
source = "/home/user/terraform-fmc-dynamicobject"
services = var.services
}
Click to toggle contents of terraform.tfvars
# This file is generated by Consul Terraform Sync.
#
# The HCL blocks, arguments, variables, and values are derived from the
# operator configuration for Sync. Any manual changes to this file
# may not be preserved and could be overwritten by a subsequent update.
#
# Task: web
# Description: update policies based on node availability
services = {
"api.ip-1-1-1-1.dc1" = {
id = "api"
name = "api"
kind = ""
address = "1.1.1.1"
port = 9090
meta = {}
tags = []
namespace = ""
status = "passing"
node = "ip-1-1-1-1"
node_id = ""
node_address = "1.1.1.1"
node_datacenter = "dc1"
node_tagged_addresses = {
lan = "1.1.1.1"
lan_ipv4 = "1.1.1.1"
wan = "1.1.1.1"
wan_ipv4 = "1.1.1.1"
}
node_meta = {
consul-network-segment = ""
}
cts_user_defined_meta = {}
},
}
Expected Behavior
Task gets executed and dynamic objects are updated with the new mappings of IP Addresses
Actual Behavior
Task is not getting executed with the error Error acquiring the state lock when executing task
Hi @sameer666, thanks for reporting this issue! I've got a few questions to help us debug and reproduce this problem.
-
It looks like the CTS configuration and generated Terraform configuration you've provided is the working config since it has the backend set to local. Could you update the CTS config and
main.tfwith the files where Consul is the backend? -
Is the session in the error message an empty string or have you redacted the actual value? Could you edit the removed values to be
<redacted>so that we can distinguish them from empty strings? -
Can you give an overview of your Consul setup and would you be able to share any relevant Consul logs for the 500 error?
-
Consul has a list of situations where the session is invalidated here. Do you think any of these scenarios could be happening while you are running CTS?
-
Is the error happening when CTS is first started or is it happening while CTS is running after initialization has completed?
- Edited with the config that is causing the issue
- Edited
- I have EC2 instances setup in AWS running the consul agents. 2 consul servers behind load balacer, one web server running consul agent and service running on it which is registered to consul server. I am running consul terraform sync on my local system and the target device Cisco FMC is also hosted on AWS
- Not matching any case
- It is happening when CTS detects a change and task needs to be executed.
Thanks for the clarifications! Nothing is standing out to me as the root cause of your issue, and I haven't been able to reproduce it. Here are some debugging steps you could try next:
- Could you delete the working directory for the task and running CTS again? It should be
sync-tasks/<taskname> - Could you try to delete the backend in Consul KV? The command to do so is
consul kv get consul-terraform-sync/terraform-env:<taskname> - Could you comment out your task and configure a different test task? One example you can use is:
task {
name = "test-task"
source = "mkam/hello/cts"
providers = ["local"]
services = ["web"]
}