ansible-datadog
ansible-datadog copied to clipboard
When upgrading datadog v6 to v7 using new 2022 apt keys, role breaks
Hello everyone,
So I use the ansible role and I recently updated the role from 4.2.1 to 4.15.0 so I could update datadog from v6 to v7 in my machines.
I am setting in the ansible role: datadog_agent_version: "1:7.36.1-1"
So when I ran the role:
TASK [datadog.datadog : Install apt-transport-https] *************************************************************************************************************************************************************************************************************************** ok: [machine] fatal: [machine]: FAILED! => {"changed": false, "msg": "Failed to update apt cache: "}
This happens because in the pkg-debian.yaml it has:
- name: Install apt-transport-https
apt:
update_cache: yes
name: apt-transport-https
state: present
when: not ansible_check_mode
But when you try to run apt get update
as part of this task there's errors regarding keys.
Hit:1 http://azure.archive.ubuntu.com/ubuntu bionic InRelease Hit:2 http://azure.archive.ubuntu.com/ubuntu bionic-updates InRelease Hit:3 http://azure.archive.ubuntu.com/ubuntu bionic-backports InRelease Hit:4 http://repo.percona.com/percona/apt bionic InRelease Hit:5 http://repo.percona.com/prel/apt bionic InRelease Hit:6 https://artifacts.elastic.co/packages/7.x/apt stable InRelease Hit:7 https://archive.mariadb.org/mariadb-10.2.31/repo/ubuntu bionic InRelease Ign:8 https://apt.datadoghq.com stable InRelease Hit:10 http://security.ubuntu.com/ubuntu bionic-security InRelease Get:9 https://apt.datadoghq.com stable Release [9927 B] Get:11 https://apt.datadoghq.com stable Release.gpg [801 B] Err:11 https://apt.datadoghq.com stable Release.gpg The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 33EE313BAD9589B7 Fetched 10.7 kB in 1s (9893 B/s) Reading package lists... Done W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://apt.datadoghq.com stable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 33EE313BAD9589B7 W: Failed to fetch https://apt.datadoghq.com/dists/stable/Release.gpg The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 33EE313BAD9589B7 W: Some index files failed to download. They have been ignored, or old ones used instead.
The only way I found to resolve this was to add a task to remove/etc/apt/sources.list.d/ansible_datadog_6.list
and /etc/apt/sources.list.d/ansible_datadog_7.list
which contains deb https://apt.datadoghq.com/ stable X
and re-run the role
Hi 👋, thanks for opening this issue. My understanding of what happened is following:
- You've been using ansible-datadog 4.2.1. This version didn't yet know about the new public APT key
33EE313BAD9589B7
. - As per https://docs.datadoghq.com/agent/guide/linux-agent-2022-key-rotation/?tab=redhatcentossuse&tabs=debianubuntu, we rotated the key on May 2.
- You tried upgrading to ansible-datadog 4.15.0 recently, but the run failed. It failed because it couldn't update repodata (because it didn't know the signing key) to install packages needed locally by our cookbook.
As a temporary fix, I think it's perfectly fine to ensure that /etc/apt/sources.list.d/ansible_datadog_6.list
and /etc/apt/sources.list.d/ansible_datadog_7.list
are not present.
In theory, this could happen again when we do the next rotation (in ~2 years), so we'll have to figure out a systematic way to solve this. One way that comes to my mind is temporarily disabling the datadog sources while installing the other packages and only re-enabling them once all other packages (and the new keys) are installed. This could be a bit tedious to implement/maintain, so I'll try to see if there's a better solution, but at least we will have a solution.
So I tried temporarily disabling the datadog sources while installing the other packages and re-enabling them afterwards. There is a problem with this approach:
APT doesn't know the concept of "temporarily disabling a source for a single operation", so the only way to do this is delete/rename the source list file. This however has unintended consequences:
- First we temporarily delete the datadog source list
- Then we install dependencies (which will run
apt update
in the background)- Updating the APT cache will remove records for the datadog source list, because it's no longer configured
- Then we re-add the datadog source list
- We now have to run
apt update
again to redownload the datadog source repodata, because we no longer have them
While we certainly could always do the last point, it's certainly not optimal to have to redownload the repodata on every single ansible run. I'll try to think a little bit more to see if we could get a better solution for this.