Stack overflow for deep expressions in bincode serialization/deserialization
With rust code containing deep expressions, hax fails with a stack overflow. For example:
fn main() -> i32 {
1 + 1 + 1 + 1 + 1 + 1 + 1 +
1 + 1 + 1 + 1 + 1 + 1 + 1 +
1 + 1 + 1 + 1 + 1 + 1 + 1 +
1 + 1 + 1 + 1 + 1 + 1 + 1
}
The stack trace suggests that this has to do with serialization/deserialization with bincode that was introduced in #743. However this problem cannot be reproduced with the merge commit of #743 itself (but it loops for a long time). The version of bincode hasn't changed since then so there might be an interaction with another later change that triggers the stack overflow.
cargo hax json works fine for me. So the frontend seems fine. Only when extracting F* it fails for me (actually it just doesn't finish before I kill it).
@franziskuskiefer You were right, the problem is in the engine. It happens in phase specialize where in some cases recursive calls are done twice over the same thing resulting in exponential complexity.
I don't know why this started making Rust stack overflows instead of just taking a long time.
It turns out this fix doesn't improve the original problem. The example of deep expressions was only an attempt to minimize that had a different problem.
My current understanding is that when we have a rather big (or deep) module hierarchy, the items in HaxMeta contain a lot of data, and the data for a submodule seems to be duplicated in every parent module. There is no problem for encoding this with bincode, but decoding uses recursion and gives a stack overflow.
I tested a few serialization libraries and most don't support recursive types. Among those that do it seemed that bincode was one of the most efficient (I found none that seemed to behave better with deep recursive data).
I think we should bisect here, to understand exactly what went wrong and when
The result of bisecting is that the first "bad" commit is https://github.com/hacspec/hax/commit/748c4889e164fa69119e6afed3bc3abcf7463c2e
The result of bisecting is that the first "bad" commit is 748c488
Removing is_local from the DefId struct (on the latest main) does fix the problem. It would be really surprising that just this boolean field would make bincode crash. We tried to compare what else is different a json version of the HaxMeta object (that we save with bincode). There are some differences but it is hard to analyze as the file is very big. Minimizing the example is also very hard.
Is the recursive call on the side of hax or inside bincode? If it's in hax you can use stacker to dynamically allocate more stack at each recursive call. This is what charon and rustc use:
// This is the amount of bytes that need to be left on the stack before increasing the size.
// It must be at least as large as the stack required by any code that does not call
// `ensure_sufficient_stack`.
const RED_ZONE: usize = 100 * 1024; // 100k
// Only the first stack that is pushed, grows exponentially (2^n * STACK_PER_RECURSION) from then
// on. This flag has performance relevant characteristics. Don't set it too high.
const STACK_PER_RECURSION: usize = 1024 * 1024; // 1MB
/// Grows the stack on demand to prevent stack overflow. Call this in strategic locations
/// to "break up" recursive calls. E.g. almost any call to `visit_expr` or equivalent can benefit
/// from this.
///
/// Should not be sprinkled around carelessly, as it causes a little bit of overhead.
#[inline]
pub fn ensure_sufficient_stack<R>(f: impl FnOnce() -> R) -> R {
stacker::maybe_grow(RED_ZONE, STACK_PER_RECURSION, f)
}
If it's in bincode then you should report it upstream, that sounds like something they should be handling gracefully (and they can use stacker too).
Yeah, we wanted to use stacker as well, thanks for the tip! :) But there are a number of odd things around this bug, we need to understand exactly what's going on there! And I'd like to address #1001 in the same time :)