datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

processes: use multiple process_context tags

Open amarziali opened this issue 1 year ago • 3 comments

What does this PR do?

This PR prepares the field to return multiple service_context tags when the metadata extraction is done.

While it does not look a pure code-wise breaking change (the function GetServiceContext already supported multiple tags to be returned), this can be indeed a functional breaking change for USM that should handle this multi-service use case.

Motivation

Java applications servers are single processes hosting different services from the APM point of view. Those services should be reported at least for the service catalogue autodetection

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

amarziali avatar Mar 14 '24 10:03 amarziali

In the current implementation is there anything enforcing order of the process_context tags when sent to the backend?

There is no concept of order concerning services that are discovered. However, if you need to deterministically pick one, I can sort them alphabetically. Is USM needing this kind of ordering?

amarziali avatar Mar 14 '24 12:03 amarziali

lexicographic order does not ensure determinism

assuming with agent version 7.53 we report services b, c and d for a certain process in 7.54 we improve the algorithm, so we report as well a along with b, c and d

guyarb avatar Mar 14 '24 12:03 guyarb

lexicographic order does not ensure determinism

assuming with agent version 7.53 we report services b, c and d for a certain process in 7.54 we improve the algorithm, so we report as well a along with b, c and d

I also need to sort the discovered service names since they will be reported in the order they are found and, if the user just shuffle them in the server configuration, that order will be lost.

We could put in place a kind of versioning of those meta in order to cope with changes along different versions of the agent if the algorithm changes. My opinion on this is that we can defer to implement a more complex approach at the time we'll need it.

I do not foresee changes driven by the way enterprise servers are deploying apps, since it did not change in decades. However, it's true that improving or bugfixing an algorithm may break that order. It might be acceptable for a feature in beta or not. I anyway propose to put in place something only when the problem will arise

amarziali avatar Mar 14 '24 14:03 amarziali