teavm
teavm copied to clipboard
Memory corruption when loading resource metadata in WebAssembly
When using the WebAssembly target, I've found that using resource metadata (e.g. Unicode tables) causes the subsequent GC to fail. Specifically, GC.sweep
either loops infinitely or makes an out-of-bounds memory access.
For example, this code will cause an infinite loop when compiled to Wasm and run:
public class Main {
public static void main(String[] args) {
Character.getNumericValue((int) '1');
System.gc();
}
}
I've spent some time debugging this, and found that it is related to resource objects created by MetadataIntrinsic
and stored in the data segment of the .wasm file. At runtime, references are created to those objects on the stack and/or heap, which confuses GC
, leading to the behavior described above.
The following patch to GC.java fixes the issue, although I'm not sure it's an ideal solution:
private static boolean isMarked(RuntimeObject object) {
- return (object.classReference & RuntimeObject.GC_MARKED) != 0
+ return object.toAddress().isLessThan(heapAddress())
+ || (object.classReference & RuntimeObject.GC_MARKED) != 0
|| (!isFullGC && (object.classReference & RuntimeObject.GC_OLD_GENERATION) != 0);
}
This is not a good solution:
- There's also C backend, which does not guarantee, that data segment address is less than heap address.
- GC is performance critical part, and all extra checks should be avoided.
The proper solution is to find out, why isMarked
is called for such objects. There are three options:
- It's a GC stack root.
- It's referenced from heap object.
- It's referenced from heap array.
All three should be carefully avoided. To prevent resource object being a stack root, TeaVM should check which references have resource types and prevent them from adding to shadow stack. As for heap objects, TeaVM should check field types, and prevent fields, referencing resources, from being added to GC fields offsets in class metadata. The latest case is simplest: creation of resource arrays should be avoided in the code. To help that, it's possible to write an analysis that reports error when hits corresponding case in the code.
Pushed commit that attempts to fix the issue. Please, check
Pushed commit that attempts to fix the issue. Please, check
Thanks. I just tried your commit, but the example I gave above (Character.getNumericValue
followed by System.gc
) still causes an infinite loop in GC.sweep
.
For reference, I'm using the Main.java
I gave above, plus this index.html
:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>foo</title>
<script src="classes.wasm-runtime.js" type="text/javascript"></script>
</head>
<body>
</body>
</html>
Then, in my web browser (I'm using Firefox, but it should be reproducible in any browser), I load the page and run this from the developer console: await TeaVM.wasm.load("classes.wasm").then(teavm => teavm.main())
. That never finishes, and eventually Firefox times it out with the resulting stack trace:
Script terminated by timeout at:
meth_otr_GC_objectSize@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[77]:0x29fc
meth_otr_GC_sweep@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[41]:0x13eb
meth_otr_GC_doCollectGarbage@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[17]:0x921
meth_otr_GC_collectGarbageImpl@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[16]:0x86e
meth_otr_GC_collectGarbageFullImpl@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[14]:0x832
teavm_gc_collectFull@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[13]:0x826
meth_jl_System_gcLowLevel@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[92]:0x336b
meth_jl_System_gc@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[91]:0x3364
meth_Main_main@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[189]:0x6089
meth_otr_Fiber_lambda_startMain_0@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[180]:0x5f00
meth_otr_Fiber_startMain_lambda__25_0_run@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[154]:0x4e95
meth_otr_Fiber_start_0@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[179]:0x5e5c
meth_otr_Fiber_start@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[177]:0x5d8a
meth_otr_Fiber_startMain@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[178]:0x5dfd
teavm_start@http://localhost:8000/classes.wasm-runtime.js line 194 > WebAssembly.instantiate:wasm-function[288]:0x6ac6
wrapExport/TeaVM.wasm@http://localhost:8000/classes.wasm-runtime.js:147:29
createMain/TeaVM.wasm/TeaVM.wasm<@http://localhost:8000/classes.wasm-runtime.js:223:73
createMain/TeaVM.wasm@http://localhost:8000/classes.wasm-runtime.js:207:20
@debugger eval code:2:62
promise callback*@debugger eval code:2:42
@debugger eval code:3:3
Ok, tried another attempt
Ok, tried another attempt
Thanks again. That seems to have fixed the test case I gave above. However, here's a more complicated example which is still broken: https://github.com/dicej/teavm-memory/blob/ae74daff091d44c66a7e9c92d0f6203655db4b36/src/main/java/Main.java
I'm running that example using await TeaVM.wasm.load("classes.wasm").then(teavm => teavm.main(["1000"]))
, and it works fine for about the first 180 iterations, but then crashes immediately after the first GC with an invalid table access. Note that it doesn't seem to crash during GC this time -- just afterward. So perhaps you did fix the original issue, but it seems that the fix may have introduced a regression which affects the Wasm function table(s).
Then I don't know. But I can't accept your patch, since it's incorrect. C backend uses the same GC and works without problems. So the problem is not in GC, but somewhere around it. My advice: please, look at assertions C backend inserts into code to check GC (I think there was some switch or system property to enable generation of these assertions) and try to recreate similar in Wasm. Without it it's almost impossible to debug GC. Personally, I don't have time and motivation to find bugs in Wasm BE.
No worries -- thanks for looking into it. I'll post a PR if I can come up with a better solution.
Not sure if further errors are caused by resource loading. I started to port your changes to this repository. I think that heap is corrupted by interop. The pattern with Address.of(array)
is slightly misused there. In my C runtime implementation I use this technique to fill a byte buffer, so pass address of data to the buffer, and then read from Java array. But WASI support uses acquired pointer to traverse the results, which is not correct. Since managed array is not used after the call, compiler can remove it from stack roots and thus make it relocatable. So I implement Address.pin
method to make sure that array is not relocated until native pointer is in use, and check if it helps. Of course, the proper way to debug GC is to implement assertions - this should not be very hard.
Thanks for pointing that out -- Address.pin
sounds useful. And if that doesn't help, I can also switch back to allocating buffers using Memory.malloc
and copying the array contents into those buffers before traversing.
BTW, the "invalid table" errors I mentioned above were happening on the master branch of this repository (which does not yet have any WASI code in it).
Looks like I fixed the issue. Both examples (Character.getNumericValue and PI) work fine for me.
Thanks, @konsoletyper! This issue appears to be fixed, although the pi digits test still fails after about 7660 iterations due to https://github.com/konsoletyper/teavm/issues/634.