terraform Silence refresh output from plans and applies

Silence refresh output from plans and applies

Open joeaawad opened this issue 3 years ago • 17 comments

Current Terraform Version

0.14.2

Use-cases

Silence all of the module.abc: Refreshing state... [id=abc] output in plans and applies so that the output is more concise and easier to review. This is especially useful when you have automation iterating through a number of different terraform directories and outputting plans for each of them.

Attempted Solutions

Historically, I have been able to use:

terraform refresh 2>&1 > /dev/null && terraform plan -refresh=false

Because terraform plan has historically done a refresh of all items before running a plan, this was an adequate solution that allowed me to discard all of the successful refresh output and only output the actual plan or an error from the refresh process. However, with https://github.com/hashicorp/terraform/pull/26270 changing the refresh to occur just in time and saying that the refresh command will be deprecated in the future, it does not appear that this will continue to work moving forwards. Additionally, this solution encounters all the problems that https://github.com/hashicorp/terraform/pull/26270 was created to avoid, which is less than ideal.

Proposal

Add an argument to the plan and apply commands like -silent-refresh that hides all of the module.abc: Refreshing state... [id=abc] outputs. I believe this would simply require adding a new argument and putting https://github.com/hashicorp/terraform/blob/ef65f552c842af618ff09f2efc44d7b0df5d62a3/command/hook_ui.go#L257-L259 inside of a conditional based on that new argument.

References

https://github.com/hashicorp/terraform/pull/26270
https://github.com/hashicorp/terraform/issues/16468 has some people who are looking for the same solution but predates https://github.com/hashicorp/terraform/pull/26270 and got derailed by some other topics that are long outdated.

Dec 09 '20 06:12 joeaawad

Thanks for this feature request, @joeaawad.

We might consider a special option for this, although we do tend to resist having various different output customization options because they grow to be a bit of a maintenance nightmare over time.

With that said, a more general thing we're planning to pursue in future is to make better use of virtual terminal control codes to make the output more concise, such as by only showing information about pending operations while those operations are pending and to omit them from the final output. Our main blocker for this today is that historically Windows did not have a full virtual terminal implementation and so Terraform has been using an emulation layer to translate to the Win32 Console API. A Windows 10 update has since introduced full terminal support and so we're hoping to be able to take that emulation layer out of the mix in future and thus allow for these improvements.

In principle, do you think that having the various "Refreshing state..." messages vanish once the read is complete would address your concern about the final plan output being too verbose?

Dec 10 '20 01:12 apparentlymart

Hey @apparentlymart, thanks for the quick response and additional information about future plans.

In principal, I do believe that having the Refreshing state... messages vanish and not appear in the final terminal output would address my concerns and feature request. My main concern would be that this feature of hiding the messages is most useful in automation and I am not sure how well various automation systems (Jenkins, Cloud Build, Concourse, Github Actions, etc) would handle vanishing messages. If your implementation prevented the messages from appearing in their build logs after the command has finished, I am completely on board with your proposal as long as it is implemented before the refresh functionality is removed.

Dec 10 '20 02:12 joeaawad

@joeaawad, just in case you're not aware, or for others reading the issue, your command above has a significant difference from a normal plan with refresh -- the refresh command writes the new state to permanent storage while plan only stores it temporarily in the plan file. This can pose a problem if the remote resources are unexpectedly missing, removing them from the state entirely, then requiring rolling back the state or re-importing the resources. It's not an unheard of situation if other configuration has changed something about the resources' location, authentication, naming, etc.

Dec 10 '20 15:12 jbardin

Hi @joeaawad,

Anything we'd do with new terminal-oriented escape sequences is, I would expect, something we'd activate only when the stdout file descriptor is a terminal, so whether it would work for automated systems would depend on how they are implemented. Some systems implement terminal emulators so that hey can work broadly with various software and get a terminal-like experience, but certainly not all.

I think the tradeoff to be made here would therefore be about whether it's reasonable in an automation scenario -- when there isn't a nervous user wondering if this local Terraform process has hung -- to omit these progress-related messages altogether and have Terraform just remain silent until there's something to report. For folks with larger configurations where the plan/refresh phase might take a few minutes rather than a few seconds I expect that would still be quite unnerving, so perhaps there's a compromise in between of emitting fewer messages, perhaps even just one message saying that the refresh/plan operation is underway and therefore Terraform is doing something useful, rather than waiting for input or similar.

Dec 10 '20 21:12 apparentlymart

Considering what @jbardin noted and thinking about alternative workarounds, a different permutation I might consider for your case would be:

terraform plan >/dev/null -out=tfplan
terraform show tfplan

The above would discard all of the non-error output from the terraform plan operation, including the rendered plan, but then use the terraform show command to show the rendered plan anyway. I think that would get a result that better meets your requirements while avoiding the caveat @jbardin pointed out.

Dec 10 '20 21:12 apparentlymart

Thanks a lot, it looks like that works. While I do think it would be nice to have a simpler way to do this, I am good with closing the issue if you would like.

Dec 10 '20 23:12 joeaawad

@apparentlymart We've been using the workaround in https://github.com/hashicorp/terraform/issues/27214#issuecomment-742809948 but have noticed a difference in behavior between 0.13.5 and 0.14.6. While this was working in 0.13.5, in 0.14.6 the output of terraform plan and terragrunt show tfplan are different. Is this expected? Not seen any open issues around this.

$ terraform plan
**SNIP**
Terraform will perform the following actions:

  # module.ecs.aws_ecs_task_definition.default[0] will be updated in-place
  ~ resource "aws_ecs_task_definition" "default" {
      # Warning: this attribute value will be marked as sensitive and will
      # not display in UI output after applying this change
      ~ container_definitions    = (sensitive)
        id                       = "badgers-global-telegraf"
      + ipc_mode                 = ""
      + pid_mode                 = ""
        tags                     = {
            "Environment"       = "global"
            "Name"              = "badgers-global-telegraf"
            "Namespace"         = "badgers"
        }
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

------------------------------------------------------------------------

$ terraform show plan
Terraform will perform the following actions:

  # module.ecs.aws_ecs_task_definition.default[0] will be updated in-place
  ~ resource "aws_ecs_task_definition" "default" {
        id                       = "badgers-global-telegraf"
        tags                     = {
            "Environment"       = "global"
            "Name"              = "badgers-global-telegraf"
            "Namespace"         = "badgers"
        }
        # (10 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Feb 23 '21 14:02 joshmyers

State refresh can take pretty long. We should consider to print a . for each refreshed resource in interactive shells.

Refreshing state .............................................................

Oct 15 '21 08:10 visit1985

I'd like to also mention: sometimes the refresh output can expose sensitive data.

For instance, you have a kubernetes configmap managed by terraform, but the data field gets updated by something outside terraform.

Even if data is listed in ignore_changes, during the refresh stage, it outputs any changes that have occurred. This could expose things the user doesn't want exposed. I don't know of a way to silence this output currently

Nov 13 '21 21:11 loganmc10

Hiding the state refresh from the terraform output would be a welcome addition. We get confused users asking what it means all of the time, worried that their infrastructure has changed without their knowledge when it hasn't, wasting a lot of people's time.

I get not wanting to be hasty adding more output customisation options, but being able to -silent-refresh or similar to hide the "objects have changed outside of Terraform" segment of the plan/apply output would be fantastic.

Feb 02 '22 12:02 phyber

@loganmc10 k8s ConfigMap is not an appropriate way to store sensitive data. However, you can wrap values in the sensitive() function when passing them in and then they won't be outputted, even when refreshing. You get this behaviour for free if you use a resource that already marks the field as sensitive (e.g. k8s Secret).

Jun 26 '22 16:06 jbg

Having a CLI-flag to hide state refresh would be super helpful. Especially after terraform now handles "Changes outside of Terraform" better since v1.2. Together this would add up to a really concise output.

Jul 01 '22 14:07 norman-zon

It seems like towards the end of the discussion here this issue veered into discussing the "Changes outside of Terraform" message rather than the refresh progress messages, so I just want to be clear that this issue is focused on what the original requester was describing, which are the various progress messages Terraform CLI prints out on a per-resource basis while performing the refreshing steps during the plan phase:

null_resource.foo: Refreshing state... [id=3674870887545343702]
null_resource.bar: Refreshing state... [id=4082645487678125089]

I believe the tradeoff we're discussing here is whether it would be desirable to hide these progress reports altogether, or to present them in a more compact form that still makes it clear that something is happening without being specific about what exactly that is, so as to reduce the overall size of Terraform's output when working with a larger configuration that may have tens of resource instances already tracked in the state.

From discussion so far I think the tensions here are:

Lots of folks run Terraform in automation, often in contexts where the output isn't interpreted by a full terminal emulator. In those situations, we cannot rely on more "advanced" terminal control sequences such as rewriting an already-written line to update it in place to show more information.
Even for automation that does present the output in a way that behaves like a terminal, automation also typically captures the logs and saves them to a persistent data store for later reference, and some would prefer that the overall size of these be minimized to include only the information that they personally consider to be most relevant.
Separately, we have a UI design principle that any operation that takes more than a second or two must generate some kind of feedback in the output, because when using Terraform interactively users get understandably anxious when software like Terraform seems to have "hung", wondering if it really is making progress or if it's instead blocking for input or otherwise stuck.

Refresh operations usually hit the network and so typically do take a noticeable amount of time, but it seems that in most cases each individual refresh operation is relatively fast (<1s) and it's only the overall elapsed time across all of them that "feels slow" in the sense that we need to give UI feedback.
Internally Terraform is designed in an event-driven way where the UI is responding to notifications from Terraform Core such as "began refreshing this instance", "begun planning this instance", "finished applying this instance", etc, and so in the current design the UI doesn't really have a lot of "global context" about what is going on and is instead mostly just streaming out events to the terminal as they are emitted.

However, there is one existing exception to that rule which we could potentially expand on for a more general solution: the UI layer keeps track of which resource instances have already emitted a "started applying" event but have not emitted a "finished applying" event and, for each of then, periodically produces a "Still applying..." update message, specifically because apply steps can sometimes take multiple minutes and so that activates the anxiety I mentioned earlier that the operation has "hung" in some way. I'm not meaning to suggest that we should implement exactly that behavior for refreshing, but instead just that it demonstrates that the UI layer can in principle keep its own records about what's going on and use it to present a more global view of the process rather than just passing through Terraform Core's events as they arrive. We do of course still need to work out exactly the goals would be for that, and the details of how to meet them.
Additional options to allow choosing on a per-run basis between different kinds of output is not a design option on the table here, unless we can show that there are multiple materially-different use-cases to handle, as opposed to just differing tastes or that some people are using Terraform at an unusual scale that it was not designed to handle.

The goal is to potentially make some different compromises that will better suit the most common situations, and not to add various additional branching codepaths that would each need separate testing, maintenance, and consideration for each new Terraform feature added.

However, we can consider making use of distinctions Terraform already keeps track of and acts on. For example, the -no-color option and its environment variable equivalents are an existing signal that Terraform should not assume that its output is being interpreted by a terminal, and we don't intend to remove that capability and we can potentially use it in new ways as long as they are consistent with our understanding of the typical user intent represented by that option.

At the moment I don't personally see an obviously-correct design change to make here, but I do generally agree that Terraform's current treatment of refreshing events in particular is overly "chatty" for the relative unimportance of that step in the overall workflow -- if it were not something that can potentially take a long time I don't expect we would care to mention it in the UI at all. I would therefore like to find a way to reduce its prominence in the UI while still avoiding Terraform appearing to have "hung" in situations where the overall refresh phase takes a long time.

One initial idea I expressed above, which I think would still be worth exploring, is to reduce the refresh phase as a whole to a single message, rather than one message for each distinct resource instance. I can imagine implementing that by having the UI layer watch for the "starting refresh" and the "competed refresh" events for each individual resource instance and just keep a tally of how many refreshes are pending. The rules could then be:

When the tally transitions from zero to one, print the "Refreshing..." (or whatever) message, and start a background task to monitor the refresh phase.
When the tally transitions from one to zero, consider the refresh phase to be complete and terminate the background task.
Inside the background task, go to sleep and wake up on some periodic interval on the order of 10s to print an update like "Still refreshing...", as long as the background task hasn't yet been terminated.

A potential variant of this would be for the UI layer to also keep track of the start times of each resource instance individually to notice if a particular resource instance seems to be taking an unusually long amount of time, and if so call that one out explicitly in the "Still refreshing..." messages to help the user see that something might be "stuck" about that particular one.

One advantage of this design is that the number of lines devoted to refreshing in the output would now scale strictly by elapsed time and not by number of resource instances, and so a configuration with tens of resource instances that all refresh relatively quickly would have far fewer lines.

Does that seem like a viable compromise to folks here?

We do need to keep in mind the common challenge with this sort of feedback discussion: we typically only hear from the people who don't like the current behavior, because the people who like the current behavior have no reason to go looking for a GitHub issue that's considering changing it. So I think before we could proceed here we'd need to spend some time thinking about what might be lost by making this change for people who might rely on details of the current design, but at least the first step is to see if this compromise would be a reasonable one for those who raised this concern in the first place, and then we can move from there to whether the compromise would be also acceptable to people who don't have any problems with the current implementation.

Thanks!

Jul 01 '22 16:07 apparentlymart

Hey @apparentlymart,

I like the idea of a time based update instead of per resource update in general and that would certainly be an improvement, but as far as I am concerned, that would still not resolve this issue for my use case. I'm still strongly on the side of adding a new flag like -output-plan-only or -only-output-external-actions or introducing the concept of output levels (eg quiet, verbose, etc) that would allow for outputting just the sections between

Terraform used the selected providers to generate the
following execution plan. Resource actions are indicated
with the following symbols:
  + create

Terraform will perform the following actions:
...
Plan: 1 to add, 0 to change, 0 to destroy.

─────────────────────────────────────────────────

To help explain our use case, we have some automated jobs that, in a single job, plan dozens of Terraform root modules that collectively manage thousands of resources. As a result, output space is at an extreme premium since the more people have to scroll and face a wall of text, the less likely they are to actually read all of the output and make sure the changes are actually doing what they think they are doing.

Even if it changes to be every 10 second update messages, a handful of the larger root modules can take 10 minutes or more to refresh their state files. As a result, we would need to continue using our current custom tooling to /dev/null the refresh output. The ask that I have with this issue is to provide a native to Terraform solution that would allow us to eliminate that part of our custom tooling, which it seems like only a new flag to disable the output or an option to set the output level would accomplish that.

I'm happy to hop on a call (@korinne has my email) to help provide more information and context on our Terraform usage/automation and the scale of it if you are interested. The scale at which my company is using Terraform (5000+ state files) may be a bit unusual, but most of the state files are fairly small and I don't think our usage patterns are unusual relative to other fairly mature companies since I'm aware of at least several other companies with very similar usage patterns.

Jul 15 '22 21:07 joe-a-t

Hi @joe-a-t! Thanks for sharing these additional details.

It sounds like you'd expect -output-plan-only to produce exactly what terraform show planfile would produce, and so I'm not sure I see how that would differ from the "discard stdout from terraform plan -out=tfplan and then run terraform show tfplan" pattern we've already discussed above.

Would it really make a material difference if it were Terraform discarding the output, rather than /dev/null discarding the output? Or is it that you actually want terraform plan to generate something more than what terraform show tfplan would produce, but specifically not to include the "refreshing" output?

I personally don't feel very excited about adding a new option to achieve something that can already be achieved with Terraform as it's currently implemented, but if there were something unique about this new mode that would be broadly interesting to a lot of users then of course that would be a different tradeoff.

I'm sure @korinne would be happy to meet and talk in more detail, though of course I'll leave it up to her to reach out about that!

Jul 15 '22 23:07 apparentlymart

I guess the biggest net new functionality part that I really care about is https://github.com/hashicorp/terraform/issues/29808 since the big problem that I have is the fact that we currently have to do regex filtering to get rid of the Objects have changed outside of Terraform output even when doing the terraform show pattern.

I think it would certainly be nice and in my personal opinion the best solution to have a single flag we can set that tells Terraform we only want a minimal output (eg no refreshing resources output and no objects have changed outside of Terraform output). But if we had to do two separate things to get rid of both of those output types, that would be fine if it did not involve us needing to do a more expensive regex to get the desired output.

Jul 18 '22 16:07 joe-a-t

Thank you @apparentlymart for your detailed answer 🙏

As you said, I think that a lot of folks use Terraform in automation context and don't need interactivity in this case.

Separately, we have a UI design principle that any operation that takes more than a second or two must generate some kind of feedback in the output, because when using Terraform interactively users get understandably anxious when software like Terraform seems to have "hung", wondering if it really is making progress or if it's instead blocking for input or otherwise stuck.

This makes Terraform a great CLI tool for interactive usage, but in automation context this interactivity can be a burden. A lot of tools propose a non-interactive mode with flags like -q, -s, --silent, --suppress-output, --capture=no. This way, users can disable interactivity in specific cases when they don't need it 👍

I see this issue as a specific case like this. And the flags -silent-refresh and -silent-object-changed as a solution in this context.

We do need to keep in mind the common challenge with this sort of feedback discussion: we typically only hear from the people who don't like the current behavior, because the people who like the current behavior have no reason to go looking for a GitHub issue that's considering changing it. So I think before we could proceed here we'd need to spend some time thinking about what might be lost by making this change for people who might rely on details of the current design, but at least the first step is to see if this compromise would be a reasonable one for those who raised this concern in the first place, and then we can move from there to whether the compromise would be also acceptable to people who don't have any problems with the current implementation.

I totally agree with you, but in this case I think that adding a -silent-xxxx flag won't change anything for those who rely on the current implementation. They just won't use this flag ☺️

Aug 02 '22 09:08 gforien

@apparentlymart In my case, the problem is that terraform plan -out=tfplan does not work since Terraform Cloud w/ remote state doesn't support output plan to a file. What would be a workaround to display outputs only if any?

Nov 16 '22 16:11 caio-eiq

I just end up doing this when I want to suppress all the noise. terraform plan | grep -v "Refreshing state...\|Reading...\|Read complete after"

Nov 24 '22 22:11 cregkly

@apparentlymart After considering your great write up, and follow up here / there, discussing a silence option, I would even go further and propose that all the "Refreshing state...", "Reading...", "Read complete after" is actually more appropriate for a -v verbose flag, and the default behavior omits all this verbose output. Without this Terraform is indeed "chatty", as you called it.

(Due to the impact this inevitably will have, it would be best done during a version bump.)

In the mean while also I use a work-around like @cregkly commented

Jan 06 '23 18:01 jeroenhabets

If anyone comes across this thread from Google and is shocked that Hashicorp still hasn't provided a quiet output flag, this gist will help if you are willing to apply to the source code, compile, then use that binary.

https://gist.github.com/et304383/998a155363c66ae0de3ee62fac155e70

Save the file locally. Then after cloning the terraform source code:

cd /path/to/terraform
git checkout v1.4.6
git apply ~/Downloads/tf_1.4.6_less_output.patch
go install .
ln -s ~/go/bin/terraform ~/bin/tfq

Obviously modify these commands as neccessary for wherever you go bin path is and where you want the executable symlink to be located and named.

My output:

$ tfq plan

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.
Releasing state lock. This may take a few moments...
$

I get that it's not a flag, but it is a separate binary so you can use it as you wish if you want to run quiet terraform.

Sep 08 '23 17:09 et304383

terraform terraform copied to clipboard

Silence refresh output from plans and applies

Current Terraform Version

Use-cases

Attempted Solutions

Proposal

References

terraform
terraform copied to clipboard