terraform-python-testing-helper Ideas for skipping cleanup on error?

Just ran an apply/destroy cycle through tftest, where on destroy I ran into a timing error when waiting for a resource to change to the expected state:

Error: Error waiting for VPC Peering Connection (pcx-....) to be deleted: timeout while waiting for state to become 'rejected, deleted' (last state: 'active', timeout: 1m0s)

Ok, not great, but no big deal, just go into the test directory and re-run destroy... Except, the tfstate file is gone, so can't do that!

Jan 06 '22 19:01 lorengordon

For now, since we always run apply/destroy in our current usage, I'm gonna use setup(upgrade=True, cleanup_on_exit=False)... That way the .terraform directory will always be refreshed, and with a working destroy the tfstate file should always be empty anyway...

Jan 06 '22 19:01 lorengordon

I was going to suggest that. If you get a better idea let's see if it can be implemented.

Jan 06 '22 19:01 ludoo

I'm not super familiar with the _finalizer thing, which I assume is a feature of pytest? Right now, it is only configured in the setup method, and the other methods do not influence it...

Jan 06 '22 19:01 lorengordon

We wrap the setup apply and destroy in a fixture:

@pytest.fixture(scope='module')
def output(scenarios_dir, tf_vars, scenario):
    print(f'Setup starting for {scenario}')
    tf = tftest.TerraformTest(scenario, scenarios_dir)
    tf.setup()
    try:
        tf.apply(tf_vars=tf_vars)
        print(f'setup complete for {scenario}')
        yield tf.output()
        tf.destroy(**{'auto_approve': True}, tf_vars=tf_vars)
    except Exception as exc:
        print(f'Issue during apply, attempting to clean up {exc}')
        tf.destroy(**{'auto_approve': True}, tf_vars=tf_vars)

It isn't perfect, but effectively puts a retry on destroy if a transient error occurs.

Jan 11 '22 11:01 grahamhar

Ach, completely forgot to answer this, sorry!

@grahamhar I'm struggling to understand your solution (my fault ofc): setup() has cleanup_on_exit defaulting to True, so cleanup of state files in the finalizer happens anyway?

IMHO a simple way of avoiding the above is of course to pass cleanup_on_exit=False to setup(), then trapping the potential exception raised during destroy as in Graham's example, and cleaning up state manually only if destroy succeeds.

Jan 11 '22 11:01 ludoo

@ludoo Sorry I probably should have provided more explanation, let me provide a few examples of how I think the approach I suggested helps.

Issue with Terraform code under test that results in a failure under the apply phase, this captures the exception thrown from the apply and runs a destroy which should clean up any resources already created. As Terraform in many cases doesn't validate parameters it's easy to get 400 type response back which halt execution leaving some resources created.
Issue in Destroy caused by a timeout, generally when this happens a second run of destroy would rectify this, not perfect but we've found this to be the case.
Occasionally we get race conditions like a policy on an S3 bucket trying to be deleted while another S3 operation is underway, again the second attempt in the except block normally resolves this.

I agree it is probably a good idea to set cleanup_on_exit to False.

Jan 11 '22 12:01 grahamhar

@grahamhar With that config, what does pytest report if the apply succeeds, the first destroy fails, and the second destroy succeeds? Is it still a test pass (I would think so...)?

Jan 11 '22 19:01 lorengordon

Yes you get a test pass which in our use case is what we want.

If that is not what you want add a pytest.fail to the except block

Jan 11 '22 19:01 grahamhar