wrenCompile fails to initialize some fields of the parser
The parser fields previous and current are left in an uninitialized state when wrenCompile calls nextToken. This is caught by -Wuninitialized in gcc, but also breaks Wren when built with ThinLTO as the first two assignments in nextToken are removed from the generated code.
This is an amalgamated build but the relevant warning can be seen here
src/lib/wren.h:5186:20: warning: ‘parser.Parser::current’ is used uninitialized [-Wuninitialized]
5186 | parser->previous = parser->current;
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
src/lib/wren.h: In function ‘ObjFn* wrenCompile(WrenVM*, ObjModule*, const char*, bool, bool)’:
src/lib/wren.h:7901:10: note: ‘parser’ declared here
7901 | Parser parser;
| ^~~~~~
This is odd, I run wren in valgrind from time to time, and the issue never showed up. Are you sure GCC is not a little bit optimistic at finding the error, and it is in fact impossible to happen?
It's correct, wrenCompile starts like
Parser parser;
parser.vm = vm;
parser.module = module;
parser.source = source;
parser.tokenStart = source;
parser.currentChar = source;
parser.currentLine = 1;
parser.numParens = 0;
// Zero-init the current token. This will get copied to previous when
// nextToken() is called below.
parser.next.type = TOKEN_ERROR;
parser.next.start = source;
parser.next.length = 0;
parser.next.line = 0;
parser.next.value = UNDEFINED_VAL;
parser.printErrors = printErrors;
parser.hasError = false;
// Read the first token into next
nextToken(&parser);
Nowhere in here is parser.current or parser.previous initialized, the stack allocation of Parser parser; leaves it uninitialized, the very first call to nextToken then does, as the first two statements in the function
// Lex the next token and store it in [parser.next].
static void nextToken(Parser* parser)
{
parser->previous = parser->current;
parser->current = parser->next;
// [snip]
nextToken is called twice, which copies next -> current and current -> previous so both should be set but there's no particular reason I didn't initialize them. It'd make sense to do so cos the compiler can't see the order of init there and being explicit is better.
The problem is that reading an uninitialized memory is always UB in C, and thus when nextToken() executes parser->previous = parser->current; (but current is uninitialized), it executes UB, allowing the compiler to remove this line or even the whole program (or format your hard drive).
I already said it's better to initialize them explicitly, which will happen soon, don't worry about your files on your PC until then (hyperbole doesn't make a point stronger in practice fwiw).
I just wanted to note it's not just better, but in fact required to conform the C standard. Sorry if I wasn't explicit enough.
Yeah, reading the uninitialized value invokes UB, it's why ThinLTO miscompiles the code and removes the assignments in nextToken because the linker optimizations assume code does not invoke UB. There's a free license to remove the code that would / does invoke UB since it shouldn't happen. Basically doesn't matter if eventually it gets cleared out correctly, one read of an uninitialized value is all it takes with aggressive optimizations sadly. Not saying I agree with how the standard works here or anything. Just observing how and why, pedantry aside, it's a simple necessary fix in my case.
Fixed in #1012 :)