wabt
wabt copied to clipboard
Implement GC basics
Parsing / reading / writing of core structures (rec/sub) is mostly completed, validation is far from over
I have reworked the type validation system of the patch. Now it is capable of detecting the first type index for all equal types. This first type index is called canonical index. If I have two types (t1/t2), and their canonical index is computed, then type comparison is t1.canonical_index == t2.canonical_index. Sub type indices can also be turned to canonical sub indices. This is not only useful for validation, but also very important for high speed execution, since it simplifies type comparison a lot. To compute these canonical indices, a hash code is computed for each type. When two types have different hash codes, they are never equal. My hash computation algorithm might not be good, I don't have much experience with these algorithms.
The type-* gc tests are running except the runtime part of type-subtyping.wast
I think the typing system in the validator and interpreter are ok now. This is another huge change with 1500 lines of new code.
It sounds like you are canonicalising wrt type indices. But type indices are meaningless in any program that consists of more than one module. Type canonicalisation must happen globally, across module boundaries, based on the types' structure. I suspect that is the reason for the link/run-time tests failing.
The link tests are not failing, although the interpreter do a slow type comparison for import/exports. As far as I understand the interpreter here is just a demonstration, so this is probably ok. The runtime tests fail because the operations (such as ref.cast) is not implemented. I will do that in a follow-up patch.
The global type canonicalisation sounds like a very good idea! A high performance engine should do that!
@sbc100 there is a fuzzer issue in the code. The code is correct though.
https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772 As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp
LLVM considers this as a large value, and reports it as an error. There is an -rss_limit_mb to modify this limit.
What shall I do?
@sbc100 there is a fuzzer issue in the code. The code is correct though.
https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772 As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp LLVM considers this as a large value, and reports it as an error. There is an
-rss_limit_mbto modify this limit.What shall I do?
We don't tend to have time to worry about fixing all the fuzz tests issues, unless they could conceivable show up in real world programs. i.e. we tend to assume trusted and save inputs, since we don't have the resources the harden wabt against other things.
Having said that we obviously would be happy to accept fixes for such issues if folks come up with them.
There is nothing to fix here, the code is correct (and not related to this patch). It is simply a limitation of the fuzzer, it assumes too much memory allocation is likely a bug.
Is there a compile time define (or can be defined during build in some way) to detect that the fuzzer is active? Then I could guard out the failing .reserve call. Generating huge numbers is perfectly valid for a fuzzer.
I have checked the type hashing system in the whole test system. It performs 45157 comparisons, and the hash/type_count is equal 513 times. From that, 467 times the types are really equal, and 46 times they are not. Overall the efficiency is 99.9%.
I could not improve the hash by using more complex hashes. However, (hash * 33) + value has the same efficiency.
I will check the fails, maybe it gives some ideas.
Edit: Now the efficiency is 100%. It does not mean it is perfect or anything. The missing cases were valid, worth improving the code.
@sbc100 This is the first patch of the GC support. It triggers a fuzzer fail I described above. Reserving memory is correct for valid wasm files. However, a random value generated by a fuzzer causes a large memory allocation, which is considered as an error by the fuzzer. The fuzzer has an option to raise this allocation limit. What do you suggest?
Note: the issue is visible on the fuzzer backtrace:
#11 0x56421033ba2f in __allocate_at_least<std::__1::allocator<wabt::interp::DataDesc> > /usr/local/bin/../include/c++/v1/__memory/allocate_at_least.h:41:19
#12 0x56421033ba2f in __split_buffer /usr/local/bin/../include/c++/v1/__split_buffer:330:25
#13 0x56421033ba2f in std::__1::vector<wabt::interp::DataDesc, std::__1::allocator<wabt::interp::DataDesc>>::reserve(unsigned long) /usr/local/bin/../include/c++/v1/__vector/vector.h:1109:49
#14 0x564210325038 in wabt::interp::(anonymous namespace)::BinaryReaderInterp::OnDataCount(unsigned int) /src/wabt/src/interp/binary-reader-interp.cc:925:17
#15 0x5642103d8474 in wabt::(anonymous namespace)::BinaryReader::ReadDataCountSection(unsigned long) /src/wabt/src/binary-reader.cc:3113:3
The OnDataCount gets a huge number, runs the .reserve() and aborts.