how to create example.log file used in the performance measurements
The performance section mentions a log file used for testing, can you provide a link to that or what you used to generate it?
I'd like to try it out with lnav to see how it performs and if there is anything I can improve.
Thanks
Hi, unfortunately I cannot share this log file because it contains private and confidential information. I used it to measure performance because it is big enough and quite typical for my everyday use cases.
But I can gather some statistics about it and share them. In the performance section I only shared the total file size and the number of log lines in it, the length of the lines varies from 109 to 763670 bytes. I think this is the most important information describing the source, and most real log files of similar size would show similar performance. But if you need additional statistics, I think I can easily collect them. For example, I've just collected the distribution of line lengths and the number of top-level keys in lines.
Here is the distribution of the number of keys:
| Occurrences | Keys per line |
|---|---|
| 160 | 16 |
| 42 | 15 |
| 70743 | 14 |
| 28024 | 13 |
| 3643 | 12 |
| 9027 | 11 |
| 386409 | 10 |
| 1464520 | 9 |
| 60525 | 8 |
| 117789 | 7 |
| 3813411 | 6 |
| 17038 | 5 |
The data was collected using the following command:
jq 'length' example.log | sort -rn | uniq -c >example.nkeys
There are at least 5 keys in each line that are: "level", "ts", "logger", "msg", "caller".
Here is the distribution of line lengths: example.len.zip. The data was collected using the following command:
awk '{print length}' example.log | sort -rn | uniq -c >example.len
Feel free to ask me to collect any additional statistics. You can try to write a script to generate some synthetic data, or I can try jotting it down when I have time.
I found another open dataset, transformed it slightly and made the measurements.
Source file
Web robot detection - Server logs
Transformation command
pv -c -N input <public_v2.json | jq -c 'to_entries[] | {"request-id": .key} + .value + {"response": (.value.response | tonumber), "bytes": (.value.bytes | tonumber), level: "info"}' | (pv -c -N output >web-robot.log)
Notes
- hlogf 1.4.1 had issues with parsing this file, so it was excluded
Measurements
Raw details
❯ hl --version
hl 0.29.5
❯ time hl web-robot.log -c -o /dev/null
hl web-robot.log -c -o /dev/null 12.06s user 0.72s system 930% cpu 1.374 total
# ---
❯ humanlog --version
humanlog version 0.7.6+deb0543
❯ time humanlog <web-robot.log --color always >/dev/null
humanlog> reading stdin...
humanlog --color always < web-robot.log > /dev/null 92.02s user 4.38s system 108% cpu 1:28.76 total
# ---
❯ fblog --version
fblog 4.10.0
❯ time fblog web-robot.log >/dev/null
fblog web-robot.log > /dev/null 25.85s user 1.65s system 98% cpu 28.001 total
❯ time fblog -d web-robot.log >/dev/null
fblog -d web-robot.log > /dev/null 131.54s user 14.14s system 99% cpu 2:26.33 total
# ---
❯ wc -cl web-robot.log
4091155 3312769299 web-robot.log
❯ sysctl -n machdep.cpu.brand_string
Apple M1 Max