ansible Reasoning behind no

What was / is the reasoning behind this line, setting no_log: true as default. My playbook was failing at this step, and it wasn't clear why this line was not being not logged or by setting CI=true it allowed me to overwrite. For me it was failing because I had mistyped a config value.

https://github.com/prometheus-community/ansible/blob/4af5c9bd3092d64b3e284fd5b9d044893f31a323/roles/alertmanager/tasks/configure.yml#L20

Proposed Solution

I'd like to move the validate step to a different step so it is really clear if the validation is failing, I understand that we'd don't want to log secrets to the log but this seems like a good halfway step.

Jun 01 '24 17:06 chriscn

As you correctly assumed this is to avoid leaking credentials to the logs. Unfortunately it is the only way I can think of to be absolutely sure that Ansible won't ever display sensitive data, unless we'd create a Ansible module for doing the validation.

But I could be wrong, how do you propose we'd do it in a dedicated task?

Jun 01 '24 19:06 gardar

I was thinking we could use the shell task to validate and fail if the validate failed.

I'm happy to contribute a PR but wanted to know if that's something would be well received.

Jun 02 '24 17:06 chriscn

Any task that fails may print the variables passed to it, depending on the verbosity and callback plugin settings. Even if we separate the validation into two tasks—one for the actual validation configured with failed_when: false, and another to determine if the validation should be marked as failed—there is still a risk of credentials being leaked if the first task fails due to a connection error.

In my opinion, the best approach would be to create an Ansible module for the validation. This way, we can mark sensitive parameters as secret.

Jun 02 '24 22:06 gardar

Sorry if I wasn't clear, I think having a separate task that looks similar to the following:

- name: Validate AlertManager variables
   shell: "{{ _alertmanager_binary_install_dir }}/amtool check-config %s"
   nolog: true

Thinking about it maybe renaming the current task could be useful, currently when it fails it could be for a number of reasons, does the file have space, is it a permission issue. Separating these out makes it clear why the task is failing rather than a generic 'Copy alertmanager config'.

A separate module could be a good idea, although I'm not sure where it would fit.

Jun 03 '24 08:06 chriscn

I don't think that would work because if you are using no_log on that task then you can't see why it's failing and it's essentially the same thing as we are doing now.

Jun 06 '24 17:06 gardar

Reasoning behind no_log: true

Proposed Solution