l2met
l2met copied to clipboard
Feature request: count unique things seen
I would like to have l2met produce a "unique things seen" metric from my log lines. Say I have an app that logs lines like this:
user_id=3
user_id=6
user_id=3
user_id=3
user_id=6
user_id=2
I would like a metric that counts how many unique user_id
s were logged, in this case 3. A possible convention:
unique#user-id=1
unique#user-id=1
unique#user-id=3
unique#user-id=2
unique#user-id=3
unique#user-id=1
these lines would cause a user-id
metric to be emitted every interval and again in this case the value would be 3 based on these lines.
Neat idea. I've wondered where the line gets drawn between technical metrics and user analytics, but this is something we might use too if it were available. IP addresses, OAuth client IDs, and maybe User-Agent strings would be things we might track like this, along with authenticated user IDs.
@dpiddy to better predict the worst case memory usage, is there some sane max boundary on key length e.g. 128 or 256 bytes?
I think maybe even 64 bytes would be a reasonable limit, that would handle a sha256 sum.
@josephruscio wouldn't value length be more important as you basically need to make a set of values for the key and then count them at the bucket boundary?
Yeah I think the value is what we're talking about. @dpiddy agree that 64 bytes would be reasonable. Larger values can just be hashed down to that, and would be preferable when/if the values for the logfmt tuples had whitespace. For example,
unique#full-name=Collin Van Dyck
Probably wouldn't be that useful in a unique
tag but
unique#full-name=ac4780ff03d9df8fb067c67681e772d4
would be.
Yeah. I'd say the limit should be 64 bytes and the recommended usage when you are not sure if your value will be under that is to hash.
Another option would be to have l2met always hash but that has tradeoffs too.
For my immediate use case for this feature I only need 10 bytes or less.
On Thursday, October 31, 2013, Collin Van Dyck wrote:
Yeah I think the value is what we're talking about. @dpiddyhttps://github.com/dpiddyagree that 64 bytes would be reasonable. Larger values can just be hashed down to that, and would be preferable if the values for the logfmt tuples had whitespace. For example,
unique#full-name=Collin Van Dyck
Probably wouldn't be that useful in a unique tag but
unique#full-name=ac4780ff03d9df8fb067c67681e772d4
would be.
— Reply to this email directly or view it on GitHubhttps://github.com/ryandotsmith/l2met/issues/123#issuecomment-27531682 .
@collinvandyck @freeformz correct, I accidentally used the wrong term, max length for any value in the set was my concern.