original-mawk
original-mawk copied to clipboard
system("") does not flush output
according to the GNU awk manual, awk is required to flush its output when a system() call is executed, and so system("") serves as a neat and portable trick to force a flush of stdout.
http://gnu.huihoo.org/gawk-3.0.3/html_node/gawk_126.html
see "Controlling Output Buffering with system"
actually, I don't see in the text where gawk says this is "required", but only where it comments that
fflush is a recent (1994) addition to the Bell Labs research version of
awk; it is not part of the POSIX standard, and will not be available if
`--posix' has been specified on the command line (see section Command
Line Options).
and then
gawk extends the fflush function in two ways. The first is to allow
no argument at all. In this case, the buffer for the standard output
is flushed. The second way is to allow the null string ("") as the
argument.
let me quote the whole paragraph here:
Controlling Output Buffering with system
The fflush function provides explicit control over output buffering for individual files and pipes. However, its use is not portable to many other awk implementations. An alternative method to flush output buffers is by calling system with a null string as its argument:
system("") # flush output
gawk treats this use of the system function as a special case, and is smart enough not to run a shell (or other command interpreter) with the empty command. Therefore, with gawk, this idiom is not only useful, it is efficient. While this method should work with other awk implementations, it will not necessarily avoid starting an unnecessary shell. (Other implementations may only flush the buffer associated with the standard output, and not necessarily all buffered output.)
If you think about what a programmer expects, it makes sense that system should flush any pending output. The following program:
BEGIN {
print "first print"
system("echo system echo")
print "second print"
}
must print
first print
system echo
second print
and not
system echo
first print
second print
If awk did not flush its buffers before calling system, the latter (undesirable) output is what you would see.
i'm reading must print
as a requirement, but you're right in that the POSIX specification[0] does not mention this requirement explicitly.
however it really makes sense to implement it that way.
thus the system()
function should just fflush(stdout)
; before executing the syscall.
since debian/ubuntu ship mawk as the default awk implementation, i currently need an ugly workaround to get the desired behaviour for all awk implementations in use:
https://github.com/sabotage-linux/sabotage/commit/5c90662115f0c4b28df5472f0dd057c34c9f6e43
i hope you agree with my assessment.
cheers.
[0] http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
mawk does flush all output when doing system. It doesn't provide a special case where system is used as a replacement for flush.
What may be confusing the OP is that mawk buffers its input. Or in other words, it won't start processing its input until a buffer full has been read. It's the only utility implementation that I know that does that (and is annoying at times).
In:
$ (echo 1; sleep 2; echo 2) | mawk '1;NR == 1 {system("echo X")}'
1
X
2
You do see the output in the right order (mawk does flush its output), but you have to wait the 2 seconds to get it. That's where it differs from other awk implementaions.
interesting. i didn't reply yet because i analyzed the source and mawk does indeed flush all fds (including stdout) when calling system(), but still behaves as if it does not, which forces me to pass -W interactive to make mawk behave as expected.
-W interactive
saved my day!
As a consideration it might be reasonable to make -W interactive
the default operation if stdout is a terminal/tty. Having to special case mawk with this flag makes it difficult to use it portably.