wren icon indicating copy to clipboard operation
wren copied to clipboard

wrenCompile fails to initialize some fields of the parser

Open graphitemaster opened this issue 4 years ago • 8 comments

The parser fields previous and current are left in an uninitialized state when wrenCompile calls nextToken. This is caught by -Wuninitialized in gcc, but also breaks Wren when built with ThinLTO as the first two assignments in nextToken are removed from the generated code.

This is an amalgamated build but the relevant warning can be seen here

src/lib/wren.h:5186:20: warning: ‘parser.Parser::current’ is used uninitialized [-Wuninitialized]
 5186 |   parser->previous = parser->current;
      |   ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
src/lib/wren.h: In function ‘ObjFn* wrenCompile(WrenVM*, ObjModule*, const char*, bool, bool)’:
src/lib/wren.h:7901:10: note: ‘parser’ declared here
 7901 |   Parser parser;
      |          ^~~~~~

graphitemaster avatar May 26 '21 05:05 graphitemaster

This is odd, I run wren in valgrind from time to time, and the issue never showed up. Are you sure GCC is not a little bit optimistic at finding the error, and it is in fact impossible to happen?

mhermier avatar May 26 '21 05:05 mhermier

It's correct, wrenCompile starts like

  Parser parser;
  parser.vm = vm;
  parser.module = module;
  parser.source = source;

  parser.tokenStart = source;
  parser.currentChar = source;
  parser.currentLine = 1;
  parser.numParens = 0;

  // Zero-init the current token. This will get copied to previous when
  // nextToken() is called below.
  parser.next.type = TOKEN_ERROR;
  parser.next.start = source;
  parser.next.length = 0;
  parser.next.line = 0;
  parser.next.value = UNDEFINED_VAL;

  parser.printErrors = printErrors;
  parser.hasError = false;

  // Read the first token into next
  nextToken(&parser);

Nowhere in here is parser.current or parser.previous initialized, the stack allocation of Parser parser; leaves it uninitialized, the very first call to nextToken then does, as the first two statements in the function

// Lex the next token and store it in [parser.next].
static void nextToken(Parser* parser)
{
  parser->previous = parser->current;
  parser->current = parser->next;
  // [snip]

graphitemaster avatar May 26 '21 05:05 graphitemaster

nextToken is called twice, which copies next -> current and current -> previous so both should be set but there's no particular reason I didn't initialize them. It'd make sense to do so cos the compiler can't see the order of init there and being explicit is better.

ruby0x1 avatar May 26 '21 07:05 ruby0x1

The problem is that reading an uninitialized memory is always UB in C, and thus when nextToken() executes parser->previous = parser->current; (but current is uninitialized), it executes UB, allowing the compiler to remove this line or even the whole program (or format your hard drive).

ChayimFriedman2 avatar May 26 '21 22:05 ChayimFriedman2

I already said it's better to initialize them explicitly, which will happen soon, don't worry about your files on your PC until then (hyperbole doesn't make a point stronger in practice fwiw).

ruby0x1 avatar May 26 '21 22:05 ruby0x1

I just wanted to note it's not just better, but in fact required to conform the C standard. Sorry if I wasn't explicit enough.

ChayimFriedman2 avatar May 26 '21 22:05 ChayimFriedman2

Yeah, reading the uninitialized value invokes UB, it's why ThinLTO miscompiles the code and removes the assignments in nextToken because the linker optimizations assume code does not invoke UB. There's a free license to remove the code that would / does invoke UB since it shouldn't happen. Basically doesn't matter if eventually it gets cleared out correctly, one read of an uninitialized value is all it takes with aggressive optimizations sadly. Not saying I agree with how the standard works here or anything. Just observing how and why, pedantry aside, it's a simple necessary fix in my case.

graphitemaster avatar May 26 '21 23:05 graphitemaster

Fixed in #1012 :)

cxw42 avatar Aug 02 '21 12:08 cxw42