Add version pinning and telemetry for APM tracer libraries
Telemetry to send analytics to datadog for tracer library injection. Version pinning for specific tracer libraries.
Reference APMON-356
@jonbodner Feel free to merge this when you have a final look. Thanks!
@bkabrda - I've addressed all of your comments. Please take a look and let me know your thoughts. Thanks.
@bkabrda Ready for review again. Resolved all of your comments and re-tested in both debian and rhel environments.
@schneideradam nice work, I think the code looks really good now. You will need to make sure that the CI is fixed, however. In the meanwhile (some of the failures are very likely related to certain CI fixes that we've done recently, so feel free to rebase on the latest main branch, that should help a lot).
I'm going to do some local testing and play around with the feature a little bit to see if everything is 100 % and let you know if I find more places to improve.
@bkabrda Thanks. I think the main failure is coming from the guidance you gave to modify the deprecation alert message to include fail, when, and ignore_errors. I think the linter doesn't like those. https://github.com/DataDog/ansible-datadog/pull/541#discussion_r1495395668. Not sure how to fix this as its what you guided to doing. It seems I could write a custom module but that seems like overkill. Thoughts?
@schneideradam please use failed_when: false instead of ignore_errors: true. That should fix it.
@schneideradam some tests in the CI are still failing. For example, I'm seeing this error:
fatal: [127.0.0.1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'agent_dd_apm_install_pkgs' is undefined\n\nThe error appears to be in '/root/project/tasks/pkg-redhat.yml': line 96, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Set plain includepkgs variable\n ^ here\n"}
Can you please make sure CI is green?
@bkabrda CI Passing. Let me know if there's anything else here.
Hi @schneideradam, I did some more testing and found some issues with the pinning logic. My reproducer is:
- Pin version (non-latest) of a specific tracer
- Run Ansible to get it installed
- Now unpin the version to get it upgraded to latest
- Run Ansible again
On Debian/Ubuntu, this errors out:
failed: [localhost] (item=datadog-apm-library-python) => {"ansible_loop_var": "item", "cache_update_time": 1678760944, "cache_updated": false, "changed": false, "item": "datadog-apm-library-python", "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'datadog-apm-library-python=2.6.5-1'' failed: E: Held packages were changed and -y was used without --allow-change-held-packages.\n", "rc": 100, "stderr": "E: Held packages were changed and -y was used without --allow-change-held-packages.\n", "stderr_lines": ["E: Held packages were changed and -y was used without --allow-change-held-packages."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following held packages will be changed:\n datadog-apm-library-python\nThe following packages will be upgraded:\n datadog-apm-library-python\n1 upgraded, 0 newly installed, 0 to remove and 163 not upgraded.\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following held packages will be changed:", " datadog-apm-library-python", "The following packages will be upgraded:", " datadog-apm-library-python", "1 upgraded, 0 newly installed, 0 to remove and 163 not upgraded."]}
On CentOS/RHEL, this doesn't fail, but only upgrades the unpinned tracer after Ansible is executed twice in a row (the first time the repofile gets updated after the installation, on the second pass this allows yum/dnf to actually upgrade).
For Debian, you should follow the install_script logic, which explicitly unpins first, then installs and then pins. For CentOS/RHEL, you should write the repofile first before doing any install/upgrade, because that will allow the upgrade to happen correctly.
I think this is the last bit that we're missing. Thanks!
@schneideradam unfortunately there are still issues with the new logic on both Ubuntu and CentOS, let me explain:
- On Ubuntu, you can't use dpkg_selections before the package is installed locally, so this fails when first installing packages. I think probably the easiest thing we can do here is use
failed_when: falseon the task (the other way would be to do this only for packages that already have an entry in thedpkg --get-selectionsoutput, but that's probably too overcomplicated). - On CentOS/RHEL, the packages don't get pinned - running a
dnf updatestill updates them. The reason is that theincludepkgsline looks something like this:includepkgs = datadog-agent datadog-apm* datadog-apm-library-js-4.11.1-1 datadog-apm-library-ruby-1.20.0-1- thedatadog-apm*is extraneous in this case and will override the pin that comes later. In case any tracer packages are explicitly enumerated, we should not use this pin, but rather list them instead of using thedatadog-apm*glob (and not forget to add datadog-apm-inject).
Does that make sense? Sorry that this is dragging for so long, but we need to get this right to ensure good customer experience.
Hey, sorry I wasn't quite ready for a review. I was testing on a remote machine and I had to submit commits to send it over. I'll ping you when its ready for review. I'm looking into the issues, and yes I think there's a regression where RHEL is no longer grabbing the package version. I'll get this fixed up and let you know when ready. Sorry for all of the back and forth.
On Wed, Mar 6, 2024 at 3:30 AM Slavek Kabrda @.***> wrote:
@schneideradam https://github.com/schneideradam unfortunately there are still issues with the new logic on both Ubuntu and CentOS, let me explain:
- On Ubuntu, you can't use dpkg_selections before the package is installed locally, so this fails when first installing packages. I think probably the easiest thing we can do here is use failed_when: false on the task (the other way would be to do this only for packages that already have an entry in the dpkg --get-selections output, but that's probably too overcomplicated).
- On CentOS/RHEL, the packages don't get pinned - running a dnf update still updates them. The reason is that the includepkgs line looks something like this: includepkgs = datadog-agent datadog-apm* datadog-apm-library-js-4.11.1-1 datadog-apm-library-ruby-1.20.0-1 - the datadog-apm* is extraneous in this case and will override the pin that comes later. In case any tracer packages are explicitly enumerated, we should not use this pin, but rather list them instead of using the datadog-apm* glob (and not forget to add datadog-apm-inject).
Does that make sense? Sorry that this is dragging for so long, but we need to get this right to ensure good customer experience.
— Reply to this email directly, view it on GitHub https://github.com/DataDog/ansible-datadog/pull/541#issuecomment-1980329301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIEAPBGXEBGMSXIPYSYRDTYW3H2HAVCNFSM6AAAAABCCSABC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGMZDSMZQGE . You are receiving this because you were mentioned.Message ID: @.***>
@bkabrda This is now ready for review. I have tested on both RHEL and Debian.