original-mawk
original-mawk copied to clipboard
getline waits for EOF if RS != "\n"
It seems that getline
won't return until EOF unless RS is \n
. This effectively means that long running programs won't produce output until they are terminated.
E.G
{ echo "test|one";sleep 2s; }|awk -vRS="|" 'NR == 1 { print; exit; }'
Won't print 'test' for two seconds. Is this intentional? Or have I missed something? All the other awks I tried (busybox, gawk, macos awk) produce the expected result
Ah, it looks like this happens independently of RS
and has to do with an internal buffer . The following won't print anything when mawk is used (but will under the other implementations). Is there a POSIX compliant way to disable this buffering? Is this permitted under the spec?
E.G
perl -e '
$|++;
for($i=0;$i<2047;$i++) {
print "a\n"
}
while(1) {}
'|mawk 'BEGIN{getline;print}'
This appears to be a variation of issue #12 (no one's indicated that POSIX specifies a particular behavior).
I see, this is quite unfortunate as is forces me to add -Wi
and hope that the other awk implementation which may run my script don't complain.
I'm sure you have considered it at length, but I would like to add my voice to the chorus of others who favour making -Wi
default behaviour and make buffering explicit.
I have no hard data, but my suspicion is that the vast majority of awk
invocations are run within scripts on relatively small amounts of data which expect real time output. For the cases in which larger data sets are parsed, and mawk
is a deliberate choice because of its efficiency, it may make more sense to allow the user to explicitly specify the buffer size.
On another note, it seems that there is a related bug.
When -Wi
is used RS
is ignored.
E.G
echo "record,record"|mawk -Wi -vRS="," 'NR==1{print}'
prints "record,record" instead of "record".
I appreciate the quick response and greatly admire the work you do.
gawk doesn't complain about -Wi
itself, but this breaks arguments understanding, so last part is understood as file rather than program text before file.
What's interesting - stdbuf
is expected to usually work for most tools, such as grep
, tr
and sed
- without their specific unbufferization options like for sed
or grep
. Also it did for gawk
, making it to output in time even if started not from terminal, e.g.
free -h -s 1 | stdbuf -o0 grep 'Mem' | stdbuf -oL awk '{print}' | cat
In case of mawk
- if it was about unbuffered input, than I expected stdbuf -i0
to do this, but it doesn't (tried stdbuf -i0 -o0
). Though of course, gawk
still has own problems, like not reacting to sigpipe under stdbuf -oL
if e.g. redirected to head -n1
(but does under stdbuf -o0
, what is strange).
Also must note - unbuffered mode has worse performance, so making it default would not be good. Would be great if it better interacted with stdbuf
. When I played with sed
and gawk
before - using stdbuf -oL
gave better performance than using fflush()
in gawk
or grep --line-buffered
option (but can't compare with sed
, which has only option for unbufferized mode).