Reading from `/dev/stdin` is very slow
Processing this 7.5M line file normally takes ~2-3 minutes on my laptop:
$ time ./mtail -emit_prog_label=false -one_shot -one_shot_format=prometheus -logtostderr -progs=lc -logs access-ssl.log.6
I1010 12:23:55.906123 179121 main.go:117] mtail version 3.0.0-rc50 git revision 89b462d7a83bab283e84a3b6e592a964090f7f4c go version go1.18.3 go arch amd64 go os linux
I1010 12:23:55.906203 179121 main.go:118] Commandline: ["./mtail" "-emit_prog_label=false" "-one_shot" "-one_shot_format=prometheus" "-logtostderr" "-progs=lc" "-logs" "access-ssl.log.6"]
I1010 12:23:55.906590 179121 store.go:189] Starting metric store expiry loop every 1h0m0s
I1010 12:23:55.907672 179121 runtime.go:188] Loaded program lc.mtail
I1010 12:23:55.907708 179121 runtime.go:84] unmarking lc.mtail
I1010 12:23:55.907763 179121 logstream.go:61] Parsed url as /tmp/access-ssl.log.6
I1010 12:23:55.907871 179121 filestream.go:278] signalling stop at next EOF
I1010 12:23:55.907896 179121 tail.go:287] Tailing /tmp/access-ssl.log.6
I1010 12:23:55.907986 179121 tail.go:343] No polling loop in oneshot mode.
I1010 12:23:55.908012 179121 mtail.go:126] Listening on [::]:3903
I1010 12:23:55.908022 179121 tail.go:315] No gc loop in oneshot mode.
I1010 12:26:51.737588 179121 runtime.go:288] END OF LINE
I1010 12:26:51.737615 179121 vm.go:1059] VM "lc.mtail" finished
# HELP lines defined at lc.mtail:1:9-13
# TYPE lines counter
lines 7.519629e+06
real 2m55,853s
user 3m37,175s
sys 0m29,575s
But when it is piped on standard input like this:
$ cat access-ssl.log.6 | time ./mtail -emit_prog_label=false -one_shot -one_shot_format=prometheus -logtostderr -progs=lc -logs /dev/stdin
it takes forever, reading ~2.5k lines/minute:
$ http :3903/metrics|grep stdin; sleep 5m; http :3903/metrics|grep stdin
mtail_log_lines_total{logfile="/dev/stdin"} 16550
mtail_log_lines_total{logfile="/dev/stdin"} 29579
Tested on the latest relase downloaded from github and the simplest line counter program:
$ cat lc/lc.mtail
counter lines
/$/ {
lines++
}
Cool bug, thanks for the report!
If someone picks this up before me, I'd try to attach a profiler to see what's taking so long.
we've detected the same issue with mtail 3.0.0-rc50 (shipped with Debian bookworm) and latest version available (3.0.0-rc55)
perf top report looks like this:
14.22% mtail mtail [.] regexp.(*Regexp).tryBacktrack
8.83% mtail [kernel.kallsyms] [k] finish_task_switch.isra.0
3.73% mtail mtail [.] runtime.mallocgc
3.58% mtail [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.79% mtail mtail [.] regexp/syntax.(*Inst).MatchRunePos
2.71% mtail mtail [.] github.com/google/mtail/internal/runtime/vm.(*VM).execute
2.68% mtail mtail [.] github.com/google/mtail/internal/tailer/logstream.decodeAndSend
2.50% mtail mtail [.] runtime.usleep.abi0
2.39% mtail mtail [.] runtime.futex.abi0
1.56% mtail mtail [.] bytes.(*Buffer).WriteByte
1.38% mtail mtail [.] regexp.(*inputString).step
1.34% mtail mtail [.] unicode/utf8.DecodeRune
1.27% mtail mtail [.] bytes.(*Buffer).WriteRune
1.19% mtail mtail [.] runtime.(*mspan).writeHeapBitsSmall
1.16% mtail mtail [.] runtime/internal/syscall.Syscall6
1.12% mtail mtail [.] runtime.(*_panic).nextDefer
1.01% mtail mtail [.] runtime.memclrNoHeapPointers
0.98% mtail [vdso] [.] 0x000000000000072f
0.98% mtail mtail [.] runtime.findRunnable
0.87% mtail mtail [.] runtime.lock2
0.80% mtail mtail [.] runtime.stealWork
0.72% mtail [kernel.kallsyms] [k] syscall_enter_from_user_mode
0.72% mtail mtail [.] runtime.scanobject
0.69% mtail mtail [.] runtime.chanrecv
0.69% mtail mtail [.] runtime.growslice
0.69% mtail mtail [.] runtime.unlock2
0.65% mtail mtail [.] runtime.greyobject
0.58% mtail [vdso] [.] __vdso_clock_gettime
0.58% mtail mtail [.] runtime.casgstatus
0.58% mtail mtail [.] runtime.chansend
0.51% mtail [kernel.kallsyms] [k] exit_to_user_mode_prepare
0.51% mtail mtail [.] runtime.deferreturn
0.51% mtail mtail [.] runtime.scanblock
0.51% mtail mtail [.] runtime.selectgo
in our setup, mtail 3.0.0~rc43 (shipped with Debian bullseye) was working as expected
latest mtail at head:
jaq% time ./mtail -progs examples/rsyncd.mtail -one_shot -logs internal/mtail/testdata/rsyncd.log
0.01s user 0.01s system 111% cpu 0.017 total
jaq% time ./mtail -logs - -progs examples/rsyncd.mtail < internal/mtail/testdata/rsyncd.log
0.04s user 0.00s system 3% cpu 1.276 total
Wild guess that the read buffer is too small on pipes, so we're doing a lot of syscalls.
Increasing the read buffer from 4096 to 65536 had this effect:
jaq% time ./mtail -logs - -progs examples/rsyncd.mtail < internal/mtail/testdata/rsyncd.log
0.01s user 0.01s system 8% cpu 0.269 total
total time drops, CPU increased, so less time waiting on syscalls (again guessing, but seems obvious -- i could hook up a trace and get a profile for time spent blocked on syscalls to confirm)
A 128K buffer seems pretty good. I can imagine an improvement where mtail uses getsockopt to ask the kernel what the read buffer size is.
Pipes use fcntl(.. F_GETPIPE_SZ..)
Sockets use getsockopt(... SO_RCVBUF_SZ)
Datagrams use whatever you set net.core.rmem_default to, but (wild guess) you can use getsockopt here too.
Regular files don't seem to have a read buffer (because the entire file is available to the VFS I presume) and anecdotal data on the net says maybe 4GiB.
Dealing with buffer sizes on Linux seems tractable but tedious; I looked into Windows support and couldn't get a satifsfying answer.
Increasing buffers per stream is also going to increase RAM usage so I don't want to go wild here without better reasoning.
Finally, the stdin read rate is still 10x slower than direct file reads. I presume this is because the pipe buffer requires copying into kernel memory and back out again, while direct file i/o has fewer (maybe zero) copies.