datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

Enable peer tags aggregation by default

Open rickbatka opened this issue 1 year ago • 3 comments
trafficstars

https://datadoghq.atlassian.net/browse/APMST-1371

What does this PR do?

Flips default setting to "true" for three flags:

  • apm_config.peer_service_aggregation
  • apm_config.peer_tags_aggregation
  • apm_config.compute_stats_by_span_kind

Motivation

The intent is to enable peer tags aggregation (and the related "stats by span kind" aggregation) as the default option, effectively moving from "Opt-In" to "Opt-Out" for the Service Representation project (docs here). This project is moving toward General Availability and we would like to land this before the Nov 1 code freeze.

This is a non-breaking change: all back-end data and pipelines are already updated, and inferred entities are appearing throughout APM. This setting ensures that those inferred entities will have stats properly aggregated and attributed to them.

Describe how to test/QA your changes

I am unsure how to thoroughly test the trace agent. I have updated and added unit tests, but I would greatly appreciate some help confirming the changes or devising a test plan to help confirm the changes!

Possible Drawbacks / Trade-offs

This shouldn't have any breaking implications for our customers.

However, as the comments in the template config suggest, the additional aggregation could cause resource usage to increase. We have a few hundred customers using the feature today, but it is possible a wide rollout could expose more performance issues. I'm also looking for guidance from the team on if / how we need to address any such performance regression testing.

Additional Notes

I have never worked on the trace agent before, and I would greatly appreciate any feedback that will help me get this PR up to snuff!

I have not yet written the release notes - I will do that soon. For now, I'm seeking advice on testing and and a pre-review of this draft PR!

rickbatka avatar Oct 17 '24 15:10 rickbatka

CLA assistant check
All committers have signed the CLA.

bits-bot avatar Oct 17 '24 15:10 bits-bot

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=47622610 --os-family=ubuntu

Note: This applies to commit 4d290f77

Regression Detector

cit-pr-commenter[bot] avatar Oct 17 '24 18:10 cit-pr-commenter[bot]

Rebased. Once all test pass, will be ready to merge.

rickbatka avatar Oct 24 '24 13:10 rickbatka

/merge

rickbatka avatar Oct 25 '24 12:10 rickbatka

:steam_locomotive: MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. Note: if you pushed new commits since the last approval, you may need additional approval. You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

dd-devflow[bot] avatar Oct 25 '24 12:10 dd-devflow[bot]

:warning: MergeQueue: This merge request was unqueued

This merge request was unqueued

If you need support, contact us on Slack #devflow!

dd-devflow[bot] avatar Oct 25 '24 16:10 dd-devflow[bot]

@pgimalac

I'm not completely sure I understand your comment but if you set a default value then IsSet always returns true, so right now the if is always true

I'm trying to set a default in case the flag is unset, but to give the customer a way to opt-out by explicitly setting the flag to false. Is that not possible given this setup? My unit tests made me think this would work.

rickbatka avatar Oct 28 '24 13:10 rickbatka

/merge

rickbatka avatar Oct 28 '24 18:10 rickbatka

:steam_locomotive: MergeQueue: pull request added to the queue

The median merge time in main is 23m.

Use /merge -c to cancel this operation!

dd-devflow[bot] avatar Oct 28 '24 18:10 dd-devflow[bot]