hl how to create example.log file used in the performance measurements

The performance section mentions a log file used for testing, can you provide a link to that or what you used to generate it?

I'd like to try it out with lnav to see how it performs and if there is anything I can improve.

Thanks

Feb 14 '24 20:02 tstack

Hi, unfortunately I cannot share this log file because it contains private and confidential information. I used it to measure performance because it is big enough and quite typical for my everyday use cases.

But I can gather some statistics about it and share them. In the performance section I only shared the total file size and the number of log lines in it, the length of the lines varies from 109 to 763670 bytes. I think this is the most important information describing the source, and most real log files of similar size would show similar performance. But if you need additional statistics, I think I can easily collect them. For example, I've just collected the distribution of line lengths and the number of top-level keys in lines.

Here is the distribution of the number of keys:

Occurrences	Keys per line
160	16
42	15
70743	14
28024	13
3643	12
9027	11
386409	10
1464520	9
60525	8
117789	7
3813411	6
17038	5

The data was collected using the following command:

jq 'length' example.log | sort -rn | uniq -c >example.nkeys

There are at least 5 keys in each line that are: "level", "ts", "logger", "msg", "caller".

Here is the distribution of line lengths: example.len.zip. The data was collected using the following command:

awk '{print length}' example.log | sort -rn | uniq -c >example.len

Feel free to ask me to collect any additional statistics. You can try to write a script to generate some synthetic data, or I can try jotting it down when I have time.

Feb 14 '24 22:02 pamburus

I found another open dataset, transformed it slightly and made the measurements.

Source file

Web robot detection - Server logs

Transformation command

pv -c -N input <public_v2.json | jq -c 'to_entries[] | {"request-id": .key} + .value + {"response": (.value.response | tonumber), "bytes": (.value.bytes | tonumber), level: "info"}' | (pv -c -N output >web-robot.log)

Notes

hlogf 1.4.1 had issues with parsing this file, so it was excluded

Measurements

graph_3249171545.pdf

Raw details

❯ hl --version
hl 0.29.5

❯ time hl web-robot.log -c -o /dev/null
hl web-robot.log -c -o /dev/null  12.06s user 0.72s system 930% cpu 1.374 total

# ---

❯ humanlog --version
humanlog version 0.7.6+deb0543

❯ time humanlog <web-robot.log --color always >/dev/null
humanlog> reading stdin...
humanlog --color always < web-robot.log > /dev/null  92.02s user 4.38s system 108% cpu 1:28.76 total

# ---

❯ fblog --version
fblog 4.10.0

❯ time fblog web-robot.log >/dev/null
fblog web-robot.log > /dev/null  25.85s user 1.65s system 98% cpu 28.001 total

❯ time fblog -d web-robot.log >/dev/null
fblog -d web-robot.log > /dev/null  131.54s user 14.14s system 99% cpu 2:26.33 total

# ---

❯ wc -cl web-robot.log
 4091155 3312769299 web-robot.log

❯ sysctl -n machdep.cpu.brand_string
Apple M1 Max

May 31 '24 08:05 pamburus