grafana-ansible-collection icon indicating copy to clipboard operation
grafana-ansible-collection copied to clipboard

Add a config check before restarting mimir

Open panfantastic opened this issue 1 year ago • 17 comments

Hi

This modifies the mimir restart logic to ensure the configuration is valid before issuing the restart of the service. Flushing handlers to ensure it is restarted before validating the service is running/responsive.

panfantastic avatar May 10 '24 16:05 panfantastic

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 10 '24 16:05 CLAassistant

Did you consider using validate:?

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/template_module.html#parameter-validate

The validation command to run before copying the updated file into the final destination. A temporary file path is used to validate, passed in through ‘%s’ which must be present as in the examples below. Also, the command is passed securely so shell features such as expansion and pipes will not work. For an example on how to handle more complex validation than what this option provides, see handling complex validation.

Something like (have not tested this):

- name: Template Mimir config - /etc/mimir/config.yml
  ansible.builtin.template:
    src: "config.yml.j2"
    dest: "/etc/mimir/config.yml"
    owner: "mimir"
    group: "mimir"
    mode: "0644"
    validate: "mimir --config.file=%s"
  notify:
    - Restart mimir

hakong avatar May 11 '24 22:05 hakong

@hakong thats a new option to me! :D I'll try it out, but even so this will stop execution if the config file is wrong - I could have gone more complex into backing up the file before changing it but I just wanted to start simple.

panfantastic avatar May 12 '24 15:05 panfantastic

You can back up with the backup: parameter :)

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/template_module.html#parameter-backup

Create a backup file including the timestamp information so you can get the original file back if you somehow clobbered it incorrectly.

Choices: false ← (default) true

hakong avatar May 12 '24 16:05 hakong

So all in all something like:

- name: Template Mimir config - /etc/mimir/config.yml
  ansible.builtin.template:
    src: "config.yml.j2"
    dest: "/etc/mimir/config.yml"
    owner: "mimir"
    group: "mimir"
    mode: "0644"
    validate: "mimir --config.file='%s'"
    backup: true
  notify:
    - Restart mimir

hakong avatar May 12 '24 16:05 hakong

cc @GVengelen @voidquark

ishanjainn avatar May 13 '24 07:05 ishanjainn

Hello @panfantastic,

It would be great if you utilize validate in a manner similar to Loki validation.

  • Loki example:
- name: Template Loki config - /etc/loki/config.yml
  ansible.builtin.template:
.....
    validate: "/usr/bin/loki --verify-config -config.file %s"
  notify: restart loki

Also, could you please use the Fully Qualified Collection Name (FQCN) for the meta module, like this:

- name: Flush handlers after deployment
ansible.builtin.meta: flush_handlers

Additionally, please move the flush handler above the task - name: Ensure that Mimir is started. This adjustment is crucial because the task ensuring that Mimir is started needs to be executed after the flush.

voidquark avatar May 13 '24 07:05 voidquark

Thanks for the feed back! I really appreciate it!

There isn't a verify-config option as far as I can see for alloy so I'll assume you mean the same structure.

I"ll send an update to the PR shortly.

panfantastic avatar May 13 '24 19:05 panfantastic

I've been trying to submit the change to my fork but Github Actions is failing for some odd reason. I'm pretty tired now so will call it a night and try and pick this back up tomorrow.

panfantastic avatar May 13 '24 20:05 panfantastic

Regarding validation, I don't have the place to test it at the moment, but I believe you can simply follow the steps outlined in this documentation. Just make sure you're using %s parameter-validate.

I'm unsure why the restart feature has been removed, but each configuration template requires a trigger handler to restart Mimir once the configuration is changed.

In my honest opinion, having a backup for the template module is unnecessary. The role should template what is in the inventory and always reflect the state of the file. It's up to you to use Git to maintain proper inventory changes, and due to this, I believe the backup should be removed.

voidquark avatar May 13 '24 20:05 voidquark

The alternative view is that you shouldn't leave the system in a broken state such that you can't restart the service - so a change that alters a config should be reversible - i.e. you make a broken change to a config, then the system should replace the good config back rather than break the config on the system. This has never been easy to accomplish in ansible - unless it is possible now.

panfantastic avatar May 13 '24 21:05 panfantastic

The alternative view is that you shouldn't leave the system in a broken state such that you can't restart the service - so a change that alters a config should be reversible - i.e. you make a broken change to a config, then the system should replace the good config back rather than break the config on the system. This has never been easy to accomplish in ansible - unless it is possible now.

I believe everything will remain clean. If you make any configuration changes and validation fails, the real config file won't be replaced with the faulty configuration. A restart is only triggered by handler when a valid configuration is templated, ensuring that only when a valid configuration is in place will a restart occur.

voidquark avatar May 13 '24 21:05 voidquark

I neglected to read the detail on %s - I think you are correct.

panfantastic avatar May 13 '24 21:05 panfantastic

It looks like the github actions are now passing on my fork. I've not tested this against my test cluster yet, but if you are ok to send it for another stage of accepting the submission then lets try that.

panfantastic avatar May 13 '24 22:05 panfantastic

In my opinion, the last thing that needs to be addressed is the modification of the validation line.

validate: "mimir -modules --config.file=%s"

I removed the -print.config part, considering it unnecessary. It's preferable to adhere to the documentation at https://grafana.com/docs/mimir/latest/configure/about-configurations/#validate-a-configuration.

voidquark avatar May 14 '24 11:05 voidquark

Setting up mimir was very k8s heavy in suggestion - I don't use k8s so missed that section of the documentation.

panfantastic avatar May 18 '24 00:05 panfantastic

I think it can be merged now.

voidquark avatar May 18 '24 11:05 voidquark