awk
awk copied to clipboard
Cannot build IANA tz database 2022b
The latest commit (as of this report) of onetrueawk cannot built release 2022b of the IANA tz database.
Steps to reproduce
- Use latest commit from this repository; build awk and install in $PATH
- Check out tag 2022b from https://github.com/eggert/tz
-
make rearguard_tarballs
Result:
awk: syntax error at source line 110 source file ziguard.awk
context is
stdoff_column = 2 * >>> / <<< ^Zone/ + 1
awk: illegal statement at source line 110 source file ziguard.awk
awk: illegal statement at source line 110 source file ziguard.awk
make: *** [main.zi] Error 2
Regression: Works in FreeBSD 13.0 (earlier commit of onetrueawk, version 20190529) Works with gawk among others
A more compact one-liner test case:
jhawk@lrr ~ % echo foo | /usr/bin/awk '{stdoff_column = 2 * /^Zone/ + 1}'
/usr/bin/awk: syntax error at source line 1
context is
{stdoff_column = 2 * /^Zone/ >>> + <<< 1}
/usr/bin/awk: illegal statement at source line 1
Versus gawk:
jhawk@lrr ~ % echo foo | gawk '{stdoff_column = 2 * /^Zone/ + 1}'
jhawk@lrr ~ %
The current Single UNIX Specification page for awk says
When an ERE token appears as an expression in any context other than as the right-hand of the '˜' or "!˜" operator or as one of the built-in function arguments described below, the value of the resulting expression shall be the equivalent of:
$0 ˜ /
ere/
I presume that /^Zone/
in 2 * /^Zone/ + 1
is an "ERE token".
That spec speaks of "ERE tokens", which appear to be of the form "/ere/", but I don't see any specification of what an "ERE token" is in the spec.
That spec speaks of "ERE tokens", which appear to be of the form "/ere/", but I don't see any specification of what an "ERE token" is in the spec.
See Lexical Conventions in the spec:
- The token ERE represents an extended regular expression constant. An ERE constant shall begin with the <slash> character. Within an ERE constant, a <backslash> character shall be considered to begin an escape sequence as specified in the table in XBD File Format Notation. In addition, the escape sequences in Escape Sequences in awk shall be recognized. The application shall ensure that a <newline> does not occur within an ERE constant. An ERE constant shall be terminated by the first unescaped occurrence of the <slash> character after the one that begins the ERE constant. The extended regular expression represented by the ERE constant shall be the sequence of all unescaped characters and values of escape sequences between, but not including, the two delimiting <slash> characters.
Placing the regex in parens makes the yacc grammar happy and appears to produce the correct result.
The obvious fix is to simply add add:
| re
to the end of the term
rule in awkgram.y but that does increase the shift/reduce and reduce/reduce conflicts. The tests still pass though ;-)
hi deborah, thanks for the report. I have a freebsd13 at hand, 20190529 release gives the same error.I have tested earlier versions as well. so it is not a change in our release of awk that now fails the IANA tz database build.
20190529 release gives the same error.
Yes, unfortunately I tested 20190529 incorrectly and so I mistakenly told Deborah that 20190529 did not have the bug. Sorry about that. I.e., this is not a regression (though it is still a bug).
For what it's worth, Solaris 10 /usr/bin/nawk (which has a version string saying "Oct 11, 1989") has the same bug. And similarly for Solaris 10 /usr/bin/awk (aka "oawk"), which has no version string but is even older. Evidently the bug has been around for a while.
Evidently the bug has been around for a while.
yep, I've tested all those, including sol 8/10 nawk, awk, as well as MKS awk which solaris shipped as sys5 awk.
@millert obvious fix gives us 225 reduce/reduce. we can remove re
from | re | term
combinations, that reduces the reduce/reduce conflics somewhat, better but not great. to be continued.
@plan9 The grammar is definitely an area where "Here there be dragons." Tread very, very, carefully.