ml-dsa: stack usage causes overflow on Windows due to 1MB default stack size
Hi, thanks for your work on this crate!
We're using the ml-dsa crate in a Windows environment and encountered a significant issue: the crate performs all of its operations on the stack (expansion notably), which leads to a stack overflow due to the default 1MB stack size on Windows applications.
This creates problems for consumers of the crate—particularly in libraries like MLA (ANSSI-FR/MLA)—as we’re forced to work around the issue by spawning threads with larger stack sizes. This workaround is non-idiomatic and introduces additional complexity, as seen here:
https://github.com/ANSSI-FR/MLA/blob/23b99ae0441867b4a7a709019f09c35af180634d/mla/src/crypto/mlakey.rs#L207
Proposed solutions or suggestions:
- Consider moving large allocations to the heap when possible (e.g., using
BoxorVec), especially for operations requiring large buffers or temporary values. - Alternatively, make stack usage configurable behind a feature flag, so
no_stdusers can still opt into full stack usage while allowingstdusers to benefit from heap allocation.
Let me know if you’d like help profiling where the main stack usage occurs, or if you’d be open to a PR for heap-based allocation support if needed.
Thanks again!
I'd first be curious if there are ways to improve the stack usage that don't involve heap offload. Consuming 1MB of stack seems like quite a bit to me right off the bat. A cursory googling suggests other implementations are consuming ~14kB.
That said, there is already an alloc feature, and we can potentially gate heap offload on that.
cc @bifurcation