awk icon indicating copy to clipboard operation
awk copied to clipboard

fix edge case where FS is changed on commandline

Open ghshephard opened this issue 3 years ago • 2 comments

FS is not set when it's a value assigned on the command line:

$ cat > t1
1,2 3
4,5 6

shephard@Gordons-Air:~
$ awk '{printf "x"FS"x  "; print $1":"$2}' t1 FS="[ ,]" t1 t1
x x  1,2:3
x x  4,5:6
x[ ,]x  1,2:3
x[ ,]x  4:5
x[ ,]x  1:2
x[ ,]x  4:5
$ awk --version
awk version 20200816`

ghshephard avatar Oct 13 '22 07:10 ghshephard

Fix for issue https://github.com/onetrueawk/awk/issues/162

ghshephard avatar Oct 13 '22 07:10 ghshephard

Hi, ghshephard:

Since savefs() exists only to preserve the value of FS at the time a record is set, it should only be called when a value is assigned to $0. The logic is there in setsval (and setfval) to handle the side-effects, but getrec bypasses it to avoid copying (sub, gsub, and perhaps others would also benefit from a nocopy-capable setsval).

Calling savefs at other times can clobber inputFS while its record is valid and unsplit. For example, getline var will trigger the FS command line assignment if it needs to go to the next file for the next record, but it shouldn't affect an unsplit $0. Similarly, an FS command-line assignment before entering an END action should not affect an unsplit $0.

In the following examples, your modification discards inputFS before the record has been split, so that when the field splitting eventually happens it's done with the updated and incorrect FS.

$ # Correct output for both examples is a:b
$ echo a:b > f1
$ echo c:d > f2
$ ./shephard163.out '{getline x; print $1} ' f1 FS=: f2 
a
$ ./shephard163.out 'END {print $1} ' f1 FS=:
a

The simplest fix, unless I've missed something, is to postpone the side-effect handling in getrec until the record's been read.

diff --git a/lib.c b/lib.c
index ebe296f..2da6dfe 100644
--- a/lib.c
+++ b/lib.c
@@ -150,11 +150,6 @@ int getrec(char **pbuf, int *pbufsize, bool isrecord)	/* get next input record *
 	}
 	DPRINTF("RS=<%s>, FS=<%s>, ARGC=%g, FILENAME=%s\n",
 		*RS, *FS, *ARGC, *FILENAME);
-	if (isrecord) {
-		donefld = false;
-		donerec = true;
-		savefs();
-	}
 	saveb0 = buf[0];
 	buf[0] = 0;
 	while (argno < *ARGC || infile == stdin) {
@@ -194,6 +189,9 @@ int getrec(char **pbuf, int *pbufsize, bool isrecord)	/* get next input record *
 					fldtab[0]->fval = result;
 					fldtab[0]->tval |= NUM;
 				}
+				donefld = false;
+				donerec = true;
+				savefs();
 			}
 			setfval(nrloc, nrloc->fval+1);
 			setfval(fnrloc, fnrloc->fval+1);

That passes the test suite, fixes the issue you discovered, and shouldn't call savefs unnecessarily.

Regards, Miguel

mpinjr avatar Oct 14 '22 21:10 mpinjr

hi miguel, thanks for the discussion of the issue and the fix.

plan9 avatar Oct 21 '22 04:10 plan9

latest release includes @mpinjr fix.

plan9 avatar Sep 10 '23 14:09 plan9