InflateState::new_boxed() uses large stack allocation
I noticed a comment in the InflateState::new() function:
Note that this struct is quite large due to internal buffers, and as such storing it on the stack is not recommended.
The size of the struct is about 43KB. Probably not a huge issue for Windows or Linux desktops where you have 1MB+ of stack space but maybe an issue for embedded platforms.
The solution seems to be to use the new_boxed() function. However, this results in a temporary stack allocation of the entire size of the InflateState struct regardless. The function calls Box::default() to do the heap allocation which has a definition of
fn default() -> Self {
Box::new(T::default())
}
This first constructs InflateState on the stack and then moves it to the heap. I was able to confirm this by using rust-gdb to examine the assembly code. I will note that a release build on x86_64 Linux appears to optimize this to first allocate the memory and then initialize directly on the heap but a debug build does not. Other platforms also may not have this optimization.
The only solutions I can think of off the top of my head is unsafe code doing manual allocation or moving some of the larger fixed size arrays to Vec. Neither is really ideal since it seems to be a goal of this project to not use unsafe and Vec would result in some additional overhead. Still figured I would raise this issue in case someone has a better idea.
Yeah as far as I know rust still lacks a way to directly heap allocate a struct that is guaranteed to avoid the stack even in debug mode without resorting to unsafe (but feel free to correct me if I'm wrong.) I think it may be possible to allocate the array buffers separately and keep the static length information now that const generics are in the language though I don't know if that's optimal performance wise.
The reason for using default in that function was that in the past it used to used the former box keyword internally which did a direct heap allocation but that was removed a long time ago. Back then the compiler also struggled to optimize out allocation in release mode so using Box::default() was a workaround to avoid that but it's a bit of a historical artifact that no longer makes sense that should probably just be changed to Box::new in the code. (Will have to check what rust version it was changed in but it might have been years now.)
If the compiler can optimize out the stack allocation on one platform it very likely can on other platforms too though, as LLVM does much of the optimization on LLVM intermediate representation that will be mostly the same between platforms (at least with portable code like this that doesn't interface with any system stuff), before it's finally converted down to machine code.
It should also be doable to reduce the size of the compressor and decompressor structs a tiny bit as right the huffman tables in the structs used for the distances and huffman length codes are way larger than they need to be for code simplicity, something inherited from the original miniz.
One could add bytemuck as a (feature gated?) dependency and derive the Zeroable trait to use zeroed_box. Is there a policy to not use external dependencies at all if they use unsafe or is it decided on a case by case basis?
We already do that with simd_adler32. I think it's acceptable to do it behind optional features if it provides some significant benefit (or has some other demonstrable use case in case it's not for performance but a feature or something)