markus
markus copied to clipboard
rework tags
The Datadog and logging backends support tags, but the API for doing tags is a little weird and there's nothing that sanitizes tag keys and values.
This issue covers rethinking that a bit.
Right now, Markus supports something like this:
metrics.incr('somekey', 1, tags=['key:val', 'key2:val'])
That format for tags is weird if we think of tags as always having a key and a value. Datadog doesn't require that, though.
The datadog backend has restrictions on keys and values:
Tags must start with a letter, and after that may contain alphanumerics, underscores, minuses, colons, periods and slashes. Other characters will get converted to underscores. Tags can be up to 200 characters long and support unicode. Tags will be converted to lowercase.
For optimal functionality, we recommend constructing tags that use the key:value syntax. Examples of commonly used metric tag keys are env, instance, name, and role. Note that device, host, and source are "reserved" tag keys and cannot be specified in the standard way.
We store one time series per host + metric + tag combination on our backend, thus we cannot support infinitely bounded tags. Please don't include endlessly growing tags in your metrics, like timestamps or user ids. Please limit each metric to 1000 tags. (https://help.datadoghq.com/hc/en-us/articles/204312749-Getting-started-with-tags)
First, we should fix the Markus docs regarding tags to make those restrictions clearer.
After that, seems prudent to clean up tags somewhere.
Maybe Markus should clean up tags before sending them to backends? Then all backends get the same tags.
Maybe the backends themselves should clean up tags if they need to? Then we don't clean up tags that don't need cleaning up.
Maybe we do a bit of both?
Maybe we do a utility function that cleans up tags and takes either a list or a dict and returns a list of tags?
I decided not to do anything backend specific or to add automatic tag generation to the .incr()
and related functions. Instead I wrote a .generate_tag()
utility function. That's in pr #26.
People can use this as they so desire. If they have other requirements, they can do that (hashing values is an interesting one). Seems like the most flexible and least intrusive first step. We can adjust as we go along and after people have used it.