grafana-ansible-collection icon indicating copy to clipboard operation
grafana-ansible-collection copied to clipboard

BUG: Download grafana agent archive to local folder in case of different arch

Open davordbetter opened this issue 1 year ago • 9 comments

I have two hosts in inventory. One machine is amd64 and another is arm64.

While running ansible-playbook on my pc, it works fine.

TASK [grafana.grafana.grafana_agent : Create Grafana Agent temp directory] ****************************************************************************************************************************************
ok: [mon-vm -> localhost]

TASK [grafana.grafana.grafana_agent : Download Grafana Agent archive to local folder] *****************************************************************************************************************************
changed: [mon-vm -> localhost]
changed: [dev-be1 -> localhost]

TASK [grafana.grafana.grafana_agent : Extract grafana-agent.zip] **************************************************************************************************************************************************
.fcst....?? grafana-agent-linux-arm64
changed: [mon-vm -> localhost]
.fcst....?? grafana-agent-linux-amd64
changed: [dev-be1 -> localhost]

TASK [grafana.grafana.grafana_agent : Set local path] *************************************************************************************************************************************************************
ok: [mon-vm]
ok: [dev-be1]

TASK [grafana.grafana.grafana_agent : Propagate downloaded binary] ************************************************************************************************************************************************
ok: [mon-vm]
diff skipped: destination file appears to be binary
diff skipped: source file size is greater than 104448
changed: [dev-be1]

While same playbook on gitlab ci/cd pipeline does not repeat download archive and downloads only amd64 binary

TASK [grafana.grafana.grafana_agent : Create Grafana Agent temp directory] *****
--- before
+++ after
@@ -1,5 +1,5 @@
 {
-    "mode": "0755",
+    "mode": "0751",
     "path": "/tmp/grafana-agent",
-    "state": "absent"
+    "state": "directory"
 }
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Download Grafana Agent archive to local folder] ***
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Extract grafana-agent.zip] ***************
>f++++++.?? grafana-agent-linux-arm64
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Set local path] **************************
ok: [ssxmon-vm]
ok: [ssxdev-be1]
TASK [grafana.grafana.grafana_agent : Propagate downloaded binary] *************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [ssxdev-be1]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/grafana-agent/grafana-agent-linux-amd64' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
ok: [ssxmon-vm]

Looking at role task

    - name: Download Grafana Agent archive to local folder
      become: false
      ansible.builtin.get_url:
        url: "{{ _grafana_agent_download_url }}"
        dest: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        mode: 0664
      register: _download_archive
      until: _download_archive is succeeded
      retries: 5
      delay: 2
      delegate_to: localhost
      check_mode: false
      run_once: true

it has option "run_once: true". Now I'm confused why did repeat download on local env, while pipeline did honor run_once parameter.

Anyway, I think run_once should not be here or it should be solved in some different way. On other hand, this run_once is handy when I run script over high amount of VMs.

davordbetter avatar Mar 22 '24 11:03 davordbetter

did you find any workaround for the same, getting same issue while running it on bunch of hosts having both arm64 and amd64 type archs

devmittal02 avatar Mar 29 '24 10:03 devmittal02

Hey @devmittal02, Haven't checked it out as we are building a new role for Grafana Agent which is for flow mode (recommended way now) so probably can test this out on that.

If you wanna double check, we have a PR open so I can get any changes you want in that right now.

ishanjainn avatar Mar 29 '24 10:03 ishanjainn

My "workaround" is to group arm and amd VM in different groups and run 2 pipelines with interntory limit (-l)

davordbetter avatar Mar 29 '24 10:03 davordbetter

This seems a very weird issue, @davordbetter any thoughts on why this is specially failing on GitLab?

@devmittal02 What platform are you running the playbook on?

ishanjainn avatar Mar 29 '24 10:03 ishanjainn

Hey i think the issue is because of this run once, i am running on AWX to the entire fleet of ec2 machines, it spins up a on demand container and triggeres the playbook across the machines using SSM,

What's happening is lets say for 1st machine when it ran lets say that was AMD, so it downloaded the binary for that only and store in local, next time when ARM machine comes , it skips download step because of "run once" and copies only the previous AMD variant of binary, hence the issue of file doesn't exists, as it is a wrong binary

- name: Download Grafana Agent binary to controller (localhost)
  block:
    - name: Create Grafana Agent temp directory
      become: false
      ansible.builtin.file:
        path: "{{ grafana_agent_local_tmp_dir }}"
        state: directory
        mode: 0751
      delegate_to: localhost
      check_mode: false
      run_once: true

    - name: Download Grafana Agent archive to local folder
      become: false
      ansible.builtin.get_url:
        url: "{{ _grafana_agent_download_url }}"
        dest: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        mode: 0664
      register: _download_archive
      until: _download_archive is succeeded
      retries: 5
      delay: 2
      delegate_to: localhost
      check_mode: false
      run_once: true

    - name: Extract grafana-agent.zip
      become: false
      ansible.builtin.unarchive:
        src: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        dest: "{{ grafana_agent_local_tmp_dir }}"
        remote_src: false
      delegate_to: localhost
      run_once: true

devmittal02 avatar Mar 29 '24 11:03 devmittal02

@ishanjainn can't figure it out, why same docker image with roles runs on my pc with both binaries, on gitlab pipeline only one (which is correct acorting to role run_once).

But only difference is that my pc is M2 macbook (emulated amd64 docker image) while gitlab runner runs on amd64 linux ubuntu vm.

davordbetter avatar Apr 03 '24 11:04 davordbetter

The issue is indeed that the task has "run_once" It downloads the zip according the the facts of the first host, if that host contains a different cpu architecture than the others then that's going to cause the issue described.

Until this gets fixed the simplest workaround would be to separate the hosts based on cpu architecture in the playbook that executes the role.

Something like this:

inventory/hosts

[amd64_hosts]
example.host.tld

[arm64_hosts]
arm.host.tld

playbook.grafana_agent.yml


---
- name: Grafana agent on amd64 hosts
  hosts: amd64_hosts
  roles:
    - role: grafana.grafana.grafana_agent

- name: Grafana agent on amd64 hosts
  hosts: arm64_hosts
  roles:
    - role: grafana.grafana.grafana_agent

gardar avatar Apr 09 '24 10:04 gardar

Based on the message in the Grafana Agent documentation:

Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.

I believe this can be closed, and migration to Alloy is required. @ishanjainn, what are your thoughts?

voidquark avatar Oct 22 '24 09:10 voidquark

Need to reopen again, but this would be really nice to be solved and I don't see that it should be a big issue to solve. Migration to alloy will take some time, meanwhile we need to support existing environment.

davordbetter avatar Nov 28 '24 10:11 davordbetter