wasm-tools
wasm-tools copied to clipboard
Add the ability to skip CustomSections
Because they have an unbounded size, only determined by the actual implementation, it'd be great to have a CustomSectionStart
payload which would give the caller with opportunity to skip the custom section when it is large.
Seems reasonable to me! Would you be up for making a PR to do this?
I’m trying to understand the issue here, and I think the only truly problematic part here is the fact that the name of the custom section is validated for UTF-8 eagerly. The data contained within section is not actually inspected in any way and ignoring the payload will effectively trivially skip the section payload as well?
Is my understanding of the issue correct?
The problem, as I understand it, is that the current API forces the entire custom section to be buffered in-memory and handed to the Parser
to be ignored. The actual operation of ignoring the contents is trivial and the operation of parsing the custom section is indeed trivial, but the API decision of forcing buffering the entire section in a contiguous region of memory I think should be fixed (or something like that)
Ohhh, I see, yeah.
I gave this buffering issue a think while taking a walk and realized this is also an issue with all the other kinds of sections as well, isn’t it? It is entirely plausible to have a huge data section, or a huge types section, or even a function with a very large body (despite the fact that the code section is already split up into the “start” and “entry” Payload
s). Due to how Reader
s are structured today, any entity returned by a Parser
as a Payload
needs to be a linear buffer in memory.
In order to “properly” fix this in general, we’d need to effectively move all of the functionality in readers fully into the Parser
and its state machine, wouldn’t we. API wise this might be more palatable if it looks somewhat like the visitor API that was recently introduced for the operands. Otherwise having, for example, Payload::ImportSectionStart
, Payload::ImportSectionEntryModuleNameChunk
, Payload::ImportSectionEntryNameChunk
and Payload::ImportSectionEntryTypeRef
just for import sections would get unwieldy quite quickly.
Either that or change Reader
s to be more like a Parser
…?
My original intention was to avoid refactoring the whole crate to use Reader
and additionally avoiding possible performance pitfalls with using a one-byte-at-a-time approach. To that end the Parser
type is sort of like a "chunked" type where it makes decisions about what must be buffered and contiguous in memory which ideally is pretty reasonable in the long run. For example most sections of a wasm module are very small and quite reasonable to buffer in memory. Individual functions can be large but AFAIK it's a common assumption that it's all contiguous when parsing it.
The split I designed with Parser
was that "major" sections could be split up. For example each individual section can be in its own separate region of memory. Each function additionally could be in its own region of memory (no need for the whole code section to be resident). I suspect it would be good to give similar treatment to the data section as well though because it can often be quite large.
Overall though the Parser
type was a best-effort idea on my part to make a streaming parser that's both performant and doesn't require massive refactorings internally or externally to use. I don't know if anyone's actually using it within a streaming context so I don't think it's ever actually proven itself as being a good design.