wasm-tools icon indicating copy to clipboard operation
wasm-tools copied to clipboard

Add the ability to skip CustomSections

Open Plecra opened this issue 2 years ago • 5 comments

Because they have an unbounded size, only determined by the actual implementation, it'd be great to have a CustomSectionStart payload which would give the caller with opportunity to skip the custom section when it is large.

Plecra avatar Feb 07 '22 22:02 Plecra

Seems reasonable to me! Would you be up for making a PR to do this?

alexcrichton avatar Feb 08 '22 15:02 alexcrichton

I’m trying to understand the issue here, and I think the only truly problematic part here is the fact that the name of the custom section is validated for UTF-8 eagerly. The data contained within section is not actually inspected in any way and ignoring the payload will effectively trivially skip the section payload as well?

Is my understanding of the issue correct?

nagisa avatar Sep 07 '22 11:09 nagisa

The problem, as I understand it, is that the current API forces the entire custom section to be buffered in-memory and handed to the Parser to be ignored. The actual operation of ignoring the contents is trivial and the operation of parsing the custom section is indeed trivial, but the API decision of forcing buffering the entire section in a contiguous region of memory I think should be fixed (or something like that)

alexcrichton avatar Sep 07 '22 15:09 alexcrichton

Ohhh, I see, yeah.


I gave this buffering issue a think while taking a walk and realized this is also an issue with all the other kinds of sections as well, isn’t it? It is entirely plausible to have a huge data section, or a huge types section, or even a function with a very large body (despite the fact that the code section is already split up into the “start” and “entry” Payloads). Due to how Readers are structured today, any entity returned by a Parser as a Payload needs to be a linear buffer in memory.

In order to “properly” fix this in general, we’d need to effectively move all of the functionality in readers fully into the Parser and its state machine, wouldn’t we. API wise this might be more palatable if it looks somewhat like the visitor API that was recently introduced for the operands. Otherwise having, for example, Payload::ImportSectionStart, Payload::ImportSectionEntryModuleNameChunk, Payload::ImportSectionEntryNameChunk and Payload::ImportSectionEntryTypeRef just for import sections would get unwieldy quite quickly.

Either that or change Readers to be more like a Parser…?

nagisa avatar Sep 08 '22 10:09 nagisa

My original intention was to avoid refactoring the whole crate to use Reader and additionally avoiding possible performance pitfalls with using a one-byte-at-a-time approach. To that end the Parser type is sort of like a "chunked" type where it makes decisions about what must be buffered and contiguous in memory which ideally is pretty reasonable in the long run. For example most sections of a wasm module are very small and quite reasonable to buffer in memory. Individual functions can be large but AFAIK it's a common assumption that it's all contiguous when parsing it.

The split I designed with Parser was that "major" sections could be split up. For example each individual section can be in its own separate region of memory. Each function additionally could be in its own region of memory (no need for the whole code section to be resident). I suspect it would be good to give similar treatment to the data section as well though because it can often be quite large.

Overall though the Parser type was a best-effort idea on my part to make a streaming parser that's both performant and doesn't require massive refactorings internally or externally to use. I don't know if anyone's actually using it within a streaming context so I don't think it's ever actually proven itself as being a good design.

alexcrichton avatar Sep 08 '22 14:09 alexcrichton