nbind icon indicating copy to clipboard operation
nbind copied to clipboard

Memory corruption on const functions w/ Emscripten

Open arcanis opened this issue 9 years ago • 10 comments

I'm not sure if the issue comes from nbind or emscripten itself. Here is a minimal testcase repository to help debug this issue: https://github.com/arcanis/nbind-weird-behaviour (c file / js file).

When I run the code (npm start), I get the following:

client side row 42
client side column 66
row 255
column 0
row 0
column 712
row 0
column 712
row 712
column 3724
row 712
column 3724
row 3724
column 752
row 3724
column 752
row 752
column 3690
row 752
column 3690
row 3690
column 48
row 3690
column 48
row 48
column 0
row 48
column 0
row 0
column 752
row 0
column 752
row 752
column 3607

As you can see, as the test function gets called, the data it prints begin to change (but it worth noting that even during the first run, they're wrong). I'm not sure what might cause this, especially since this function shouldn't have any side effect :(

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 1.36.0
clang version 3.9.0 (https://github.com/kripken/emscripten-fastcomp-clang/ 271ce598c3d1fe74efadc254f5be1b57edea9f41) (https://github.com/kripken/emscripten-fastcomp/ 61acfb230665464544f2e8db292f8999fc3c628c) (emscripten 1.36.0 : 1.36.0)

arcanis avatar Dec 05 '16 09:12 arcanis

Nevermind, I just go too fast, and read the documentation too little ...

For anyone with this issue: toJS & fromJS !

arcanis avatar Dec 05 '16 10:12 arcanis

I still feel like more things should be catched at compile time, maybe by adding static_assert here and there to check that methods are implemented, or data structure supported.

arcanis avatar Dec 05 '16 10:12 arcanis

Your testcase should work, so there's a bug. It seems there's some problem with passing objects by value to C++, when they were initally allocated as pointers.

jjrv avatar Dec 05 '16 11:12 jjrv

Implementing toJS and fromJS seems to have fixed most of the issues, but my tests still fail (the native build pass correctly). Weird thing is that it only fail after a lot of successful iterations (about 145,000).

arcanis avatar Dec 05 '16 11:12 arcanis

I'll try to fix the first bug since it's more deterministic, and may be the underlying issue for your current tests failing later.

jjrv avatar Dec 05 '16 12:12 jjrv

Passing an object without a toJS method by value was simply totally unimplemented. It reserved a slot for temporarily holding the JavaScript object in an array (for calling its toJS method), and passed the slot number to C++... Which knew there's no fromJS method, assumed that then there's no toJS function either and so went straight on to dereference the slot index as if it was a valid pointer (objects with toJS / fromJS methods are passed through slots in a JavaScript array global within the asm.js module, while other objects are passed using pointers).

The increasing slot index made subsequent calls dereference different invalid pointers, changing the numbers. Sometimes I wish the asm.js VM supported segfaults...

This should now be fixed in master. I'll do a new release soon (have to wait for Travis to finish tests first), remember to erase the build directory when updating.

I'm afraid this doesn't sound like it had to do with your other issue, so there's probably something else too. Are you able to share the code for hunting the bug if needed?

How do the longer-running tests fail? Wrong output (what does it look like?) Crashes / hangs?

jjrv avatar Dec 05 '16 18:12 jjrv

Thanks a lot, Juha, your help is much appreciated! I've updated nbind, and it fixed my test repository, but apparently not my actual library since I did get different errors than "expected" (ie. the one I usually get) when removing the toJS and fromJS implementation (I may have messed up the nbind update, tho, unless those two functions were actually required for a different reason).

As for the issue I still have even with toJS / fromJS, you can check the code on arcanis/buffer-offset-index if you want to give it a look. I think a simple npm install && npm test should do the trick to build the library & run the tests. On my machine, the asm.js build fails with the following error:

  1) BufferOffsetIndex (asmjs) maps character indices to 2d points and viceversa:

      AssertionError: 355 == 1052
      + expected - actual

      -355
      +1052

      at Context.<anonymous> (/home/arcanis/buffer-offset-index/test/buffer-index.test.js:58:32)
      at callFn (/home/arcanis/buffer-offset-index/node_modules/mocha/lib/runnable.js:343:21)
      at Test.Runnable.run (/home/arcanis/buffer-offset-index/node_modules/mocha/lib/runnable.js:335:7)
      at Runner.runTest (/home/arcanis/buffer-offset-index/node_modules/mocha/lib/runner.js:444:10)
      at /home/arcanis/buffer-offset-index/node_modules/mocha/lib/runner.js:550:12
      at next (/home/arcanis/buffer-offset-index/node_modules/mocha/lib/runner.js:361:14)
      at /home/arcanis/buffer-offset-index/node_modules/mocha/lib/runner.js:371:7
      at next (/home/arcanis/buffer-offset-index/node_modules/mocha/lib/runner.js:295:14)
      at Immediate.<anonymous> (/home/arcanis/buffer-offset-index/node_modules/mocha/lib/runner.js:339:5)
      at runCallback (timers.js:649:20)
      at tryOnImmediate (timers.js:622:5)
      at processImmediate [as _immediateCallback] (timers.js:594:5)

More precisely, it fails at iteration === 29, i === 2, j === 0 (ie about 145,000 tests deep).

arcanis avatar Dec 05 '16 22:12 arcanis

Awesome! It was pretty easy to get the tests running (just had to install segfault-handler and change -O3 in nbind.gypi). For me, asm.js version of BufferOffsetIndex initially fails due to running out of heap space (the asm.js heap is a constant size). It prints:

Cannot enlarge memory arrays. Either (1) compile with -s TOTAL_MEMORY=X with X higher than the current value 16777216, (2) compile with -s ALLOW_MEMORY_GROWTH=1 which adjusts the size at runtime but prevents some optimizations, (3) set Module.TOTAL_MEMORY to a higher value before the program runs, or if you want malloc to return NULL (0) instead of this abort, compile with -s ABORTING_MALLOC=0

I added to nbind.gypi:

"-s", "TOTAL_MEMORY=134217728",

(between the other lines with "-s". Afterwards, the tests were successful.

I can understand why your tests would fail when it runs out of heap, but I have no idea why it wouldn't have printed a clear error message. That should be part of Emscripten's malloc implementation. My installed Emscripten version is 1.35.0, which is a bit old by now. I wonder if the newer version has an intentional or accidental change in this behavior.

Does setting the TOTAL_MEMORY fix the issue on your machine?

Are the test scripts in your Github master branch exactly the same as you're running locally?

jjrv avatar Dec 05 '16 23:12 jjrv

Oh! I actually had the same error as you, and set the -s ALLOW_MEMORY_GROWTH=1 option accordingly in an extra.gypi file that I then included into my binding.gyp (so that those flags are added after nbind's flags). However, this flag seems a bit buggy, because my tests fail with it but apparently pass fine with the flag you used, -s TOTAL_MEMORY=134217728! Do you observe the same (buggy) behaviour with ALLOW_MEMORY_GROWTH?

Maybe the bug's in emscripten, after all ... :)

arcanis avatar Dec 05 '16 23:12 arcanis

I get the same result. Using ALLOW_MEMORY_GROWTH is not recommended anyway, because it de-optimizes the entire asm.js module on compliant browsers.

The bug may indeed be in Emscripten. I have to investigate, because this may also be an opportunity to fix a hard to find bug in nbind.

jjrv avatar Dec 05 '16 23:12 jjrv