datadog-go
datadog-go copied to clipboard
Commas in tag values are not escaped
Tags are a slice of strings in the documentation, but datadog-go documentation does not mention valid values for tags. datadog-go removes newlines from tags, but does not remove/escape other invalid characters like comma (,). Comma in tag value will be interpreted as multiple tags since tags are comma-separated in the UDP datagrams.
The page at https://docs.datadoghq.com/developers/guide/what-best-practices-are-recommended-for-naming-metrics-and-tags/ lists recommeded practice for naming tags, perhaps datadog-go should provide a function to sanitize the tag values according to those rules?
FYI we implemented our custom sanitizers for tags which follows the best practices from DataDog documentation and I believe that we can create PR after maintainers of this repo confirm that sanitizers should be in this package.
Hi there,
Feel free to open a PR, sanitizing the tags makes sense to me.
I am a bit worried about the potential performance hit, make sure to mesure it if you create one ;).
Arthur
I referenced this from an OpenTelemetry metrics SDK issue in which we are looking to address the same problem. I started poking around hoping that DataDog had already set a convention for how to encode tag values with commas in them. @arbll the problem is that your agent will have to understand the convention that is used, so this issue can't be solved in the client alone.
How does DataDog intend to support tag values with commas in them?
Hi @jmacd, From the data I can gather it seems that commas are not supported at all in tags, even in the backend:
May contain alphanumerics, underscores, minuses, colons, periods, and slashes. Other characters are converted to underscores.
I think in this library it would make sense to sanitize by either removing commas or replacing them by underscores.
@arbll The document linked above, which I believe you are quoting from, describes naming tags, i.e., the valid set of characters permissible in a tag name. Restricting tag names is fine. It does not mention a restriction on possible tag values.
Anyway, in as much as I am currently responsible for the OpenTelemetry metrics specification, I will not approve any such sanitization of tag values and if this is something Datadog won't spec out, we'll probably end up creating a new specification or not supporting dogstatsd out of the box.
One way to spec this out would be to follow the W3C specification for trace context, which specifies how to encode distributed context tags in URL encoding. I.e., commas would be encoded using %2C.
Here is the specification I refer to: https://w3c.github.io/correlation-context/#value-format
Please correct if I'm wrong — from what I see, DataDog Agent doesn't have "tag name" and "tag value" distinction for metrics, those are parsed [0] directly into []string [1], using , as separator, meaning it's always just a single string "tag" (e.g., a valid list of tags: []string{"foo:bar", "baz:hey", "foo"})
Note that according to statsd/dogstatsd spec, "tag names" are not unique, so as per "valid list of tags" example above, tag foo is used twice per metric: once with value, once without. Assuming that's why DataDog uses string slice instead of hashmap for storing tags.
[0] https://github.com/DataDog/datadog-agent/blob/master/pkg/dogstatsd/parser.go#L85 [1] https://github.com/DataDog/datadog-agent/blob/master/pkg/metrics/metric_sample.go#L69