l2met Feature request: count unique things seen

I would like to have l2met produce a "unique things seen" metric from my log lines. Say I have an app that logs lines like this:

user_id=3
user_id=6
user_id=3
user_id=3
user_id=6
user_id=2

I would like a metric that counts how many unique user_ids were logged, in this case 3. A possible convention:

unique#user-id=1
unique#user-id=1
unique#user-id=3
unique#user-id=2
unique#user-id=3
unique#user-id=1

these lines would cause a user-id metric to be emitted every interval and again in this case the value would be 3 based on these lines.

Oct 29 '13 18:10 danp

Neat idea. I've wondered where the line gets drawn between technical metrics and user analytics, but this is something we might use too if it were available. IP addresses, OAuth client IDs, and maybe User-Agent strings would be things we might track like this, along with authenticated user IDs.

Oct 30 '13 22:10 aseemk

@dpiddy to better predict the worst case memory usage, is there some sane max boundary on key length e.g. 128 or 256 bytes?

Oct 31 '13 04:10 josephruscio

I think maybe even 64 bytes would be a reasonable limit, that would handle a sha256 sum.

Oct 31 '13 14:10 danp

@josephruscio wouldn't value length be more important as you basically need to make a set of values for the key and then count them at the bucket boundary?

Oct 31 '13 17:10 freeformz

Yeah I think the value is what we're talking about. @dpiddy agree that 64 bytes would be reasonable. Larger values can just be hashed down to that, and would be preferable when/if the values for the logfmt tuples had whitespace. For example,

unique#full-name=Collin Van Dyck

Probably wouldn't be that useful in a unique tag but

unique#full-name=ac4780ff03d9df8fb067c67681e772d4

would be.

Oct 31 '13 21:10 collinvandyck

Yeah. I'd say the limit should be 64 bytes and the recommended usage when you are not sure if your value will be under that is to hash.

Another option would be to have l2met always hash but that has tradeoffs too.

For my immediate use case for this feature I only need 10 bytes or less.

On Thursday, October 31, 2013, Collin Van Dyck wrote:

Yeah I think the value is what we're talking about. @dpiddyhttps://github.com/dpiddyagree that 64 bytes would be reasonable. Larger values can just be hashed down to that, and would be preferable if the values for the logfmt tuples had whitespace. For example,

unique#full-name=Collin Van Dyck

Probably wouldn't be that useful in a unique tag but

unique#full-name=ac4780ff03d9df8fb067c67681e772d4

would be.

— Reply to this email directly or view it on GitHubhttps://github.com/ryandotsmith/l2met/issues/123#issuecomment-27531682 .

Oct 31 '13 22:10 danp

@collinvandyck @freeformz correct, I accidentally used the wrong term, max length for any value in the set was my concern.

Oct 31 '13 22:10 josephruscio

l2met l2met copied to clipboard

Feature request: count unique things seen

l2met
l2met copied to clipboard