ml-dsa: stack usage causes overflow on Windows due to 1MB default stack size

Open extiop opened this issue 5 months ago • 1 comments

Hi, thanks for your work on this crate!

We're using the ml-dsa crate in a Windows environment and encountered a significant issue: the crate performs all of its operations on the stack (expansion notably), which leads to a stack overflow due to the default 1MB stack size on Windows applications.

This creates problems for consumers of the crate—particularly in libraries like MLA (ANSSI-FR/MLA)—as we’re forced to work around the issue by spawning threads with larger stack sizes. This workaround is non-idiomatic and introduces additional complexity, as seen here:
https://github.com/ANSSI-FR/MLA/blob/23b99ae0441867b4a7a709019f09c35af180634d/mla/src/crypto/mlakey.rs#L207

Proposed solutions or suggestions:

Consider moving large allocations to the heap when possible (e.g., using Box or Vec), especially for operations requiring large buffers or temporary values.
Alternatively, make stack usage configurable behind a feature flag, so no_std users can still opt into full stack usage while allowing std users to benefit from heap allocation.

Let me know if you’d like help profiling where the main stack usage occurs, or if you’d be open to a PR for heap-based allocation support if needed.

Thanks again!

Jul 29 '25 22:07 extiop

I'd first be curious if there are ways to improve the stack usage that don't involve heap offload. Consuming 1MB of stack seems like quite a bit to me right off the bat. A cursory googling suggests other implementations are consuming ~14kB.

That said, there is already an alloc feature, and we can potentially gate heap offload on that.

cc @bifurcation

Aug 01 '25 13:08 tarcieri