original-mawk getline waits for EOF if RS != "\n"

It seems that getline won't return until EOF unless RS is \n. This effectively means that long running programs won't produce output until they are terminated.

E.G

{ echo "test|one";sleep 2s; }|awk -vRS="|" 'NR == 1 { print; exit; }'

Won't print 'test' for two seconds. Is this intentional? Or have I missed something? All the other awks I tried (busybox, gawk, macos awk) produce the expected result

Dec 02 '21 23:12 lemnos

Ah, it looks like this happens independently of RS and has to do with an internal buffer . The following won't print anything when mawk is used (but will under the other implementations). Is there a POSIX compliant way to disable this buffering? Is this permitted under the spec?

E.G

perl -e '
	$|++;

	for($i=0;$i<2047;$i++) {
		print "a\n" 
	}

	while(1) {}
'|mawk 'BEGIN{getline;print}'

Dec 03 '21 00:12 lemnos

This appears to be a variation of issue #12 (no one's indicated that POSIX specifies a particular behavior).

Dec 03 '21 00:12 ThomasDickey

I see, this is quite unfortunate as is forces me to add -Wi and hope that the other awk implementation which may run my script don't complain.

I'm sure you have considered it at length, but I would like to add my voice to the chorus of others who favour making -Wi default behaviour and make buffering explicit.

I have no hard data, but my suspicion is that the vast majority of awk invocations are run within scripts on relatively small amounts of data which expect real time output. For the cases in which larger data sets are parsed, and mawk is a deliberate choice because of its efficiency, it may make more sense to allow the user to explicitly specify the buffer size.

On another note, it seems that there is a related bug.

When -Wi is used RS is ignored.

E.G

 echo "record,record"|mawk -Wi -vRS="," 'NR==1{print}'

prints "record,record" instead of "record".

I appreciate the quick response and greatly admire the work you do.

Dec 03 '21 00:12 lemnos

gawk doesn't complain about -Wi itself, but this breaks arguments understanding, so last part is understood as file rather than program text before file. What's interesting - stdbuf is expected to usually work for most tools, such as grep, tr and sed - without their specific unbufferization options like for sed or grep. Also it did for gawk, making it to output in time even if started not from terminal, e.g.

free -h -s 1 | stdbuf -o0 grep 'Mem' | stdbuf -oL awk '{print}' | cat

In case of mawk - if it was about unbuffered input, than I expected stdbuf -i0 to do this, but it doesn't (tried stdbuf -i0 -o0). Though of course, gawk still has own problems, like not reacting to sigpipe under stdbuf -oL if e.g. redirected to head -n1 (but does under stdbuf -o0, what is strange).

Also must note - unbuffered mode has worse performance, so making it default would not be good. Would be great if it better interacted with stdbuf. When I played with sed and gawk before - using stdbuf -oL gave better performance than using fflush() in gawk or grep --line-buffered option (but can't compare with sed, which has only option for unbufferized mode).

Jan 06 '22 18:01 nick87720z

original-mawk original-mawk copied to clipboard

getline waits for EOF if RS != "\n"

original-mawk
original-mawk copied to clipboard