log Sampling interface

Hey, i was wondering if we could code a sampling interface, it would be nice to have an intermediate writter, so i can hash or count things in the log entry before writing it with the configured writer. Can't access Entry fields directly as they're stored in a byte buffer, can only use e.Value() which returns the raw JSON bytes

edit: oh i see now, that the output is built step by step as a json string, so we can't get a variable like msg, level separately.

Feb 04 '25 05:02 tekert

Yes, marhsal to json directly for max performance, also said by zerolog author. By default only can get level and json value, see https://github.com/phuslu/log?tab=readme-ov-file#customize-the-log-writer But if you dont care performance much, you could use consolewriter to unmarshal it. see https://github.com/phuslu/log?tab=readme-ov-file#consolewriter Even more, you could use internal parseFormatterArgs function to control more, see https://github.com/zzf2333/clash/blob/5e19f9b292e6ff74f60b090a11d16dde01e8e109/log/log.go#L134-L215

Feb 04 '25 06:02 phuslu

Uhm, the sad thing is that i need performance, around 150k executions per second, maybe i can modify it to get unsafe.Slice to elements inside buf? and another one time only allocated slice where each slot has a pointer to keys -> unsafe.Slice, maybe another variable to get msg pointers directly. wrap them in unsafe.String. Just brainstorming.

Feb 04 '25 20:02 tekert

Besides using "unsafe magic", my thoughts is below:

The normal way is turn to stdlib "log/slog", because it has a "slog.Record" type in the middle-layer. The disadvantage is its json handler is a bit slow. Could considering replace the json handler by "phuslu/log.SlogNewJSONHandler()" -- it's the fastest slog json handler in the world, see https://madkins23.github.io/go-slog/scores/Simple/summary.html
If performance is first class (e.g. 150K), I think there's no sliver bullet. Even using "unsafe magic" also slow down your QPS/TPS. Could considering use external program outside of the main process do the log processing(e.g. vector.dev).

Feb 05 '25 11:02 phuslu

Yeah, i wil ltry option 1 when i get to work in the logger part. i already did the benchmarks, very good indeed.

The problem with option 2 is complexity, I think that ataching numbers to the most repeated logs i can just go the easy route and just check on that before letting the log through.

Thanks.

Feb 06 '25 03:02 tekert