hax icon indicating copy to clipboard operation
hax copied to clipboard

Stack overflow for deep expressions in bincode serialization/deserialization

Open maximebuyse opened this issue 1 year ago • 6 comments

With rust code containing deep expressions, hax fails with a stack overflow. For example:

fn main() -> i32 {
        1 + 1 + 1 + 1 + 1 + 1 + 1 + 
        1 + 1 + 1 + 1 + 1 + 1 + 1 + 
        1 + 1 + 1 + 1 + 1 + 1 + 1 + 
        1 + 1 + 1 + 1 + 1 + 1 + 1
}

The stack trace suggests that this has to do with serialization/deserialization with bincode that was introduced in #743. However this problem cannot be reproduced with the merge commit of #743 itself (but it loops for a long time). The version of bincode hasn't changed since then so there might be an interaction with another later change that triggers the stack overflow.

maximebuyse avatar Sep 03 '24 15:09 maximebuyse

cargo hax json works fine for me. So the frontend seems fine. Only when extracting F* it fails for me (actually it just doesn't finish before I kill it).

franziskuskiefer avatar Sep 03 '24 15:09 franziskuskiefer

@franziskuskiefer You were right, the problem is in the engine. It happens in phase specialize where in some cases recursive calls are done twice over the same thing resulting in exponential complexity. I don't know why this started making Rust stack overflows instead of just taking a long time.

maximebuyse avatar Sep 04 '24 10:09 maximebuyse

It turns out this fix doesn't improve the original problem. The example of deep expressions was only an attempt to minimize that had a different problem.

My current understanding is that when we have a rather big (or deep) module hierarchy, the items in HaxMeta contain a lot of data, and the data for a submodule seems to be duplicated in every parent module. There is no problem for encoding this with bincode, but decoding uses recursion and gives a stack overflow.

I tested a few serialization libraries and most don't support recursive types. Among those that do it seemed that bincode was one of the most efficient (I found none that seemed to behave better with deep recursive data).

maximebuyse avatar Sep 10 '24 07:09 maximebuyse

I think we should bisect here, to understand exactly what went wrong and when

W95Psp avatar Sep 23 '24 07:09 W95Psp

The result of bisecting is that the first "bad" commit is https://github.com/hacspec/hax/commit/748c4889e164fa69119e6afed3bc3abcf7463c2e

maximebuyse avatar Sep 24 '24 13:09 maximebuyse

The result of bisecting is that the first "bad" commit is 748c488

Removing is_local from the DefId struct (on the latest main) does fix the problem. It would be really surprising that just this boolean field would make bincode crash. We tried to compare what else is different a json version of the HaxMeta object (that we save with bincode). There are some differences but it is hard to analyze as the file is very big. Minimizing the example is also very hard.

maximebuyse avatar Sep 25 '24 07:09 maximebuyse

Is the recursive call on the side of hax or inside bincode? If it's in hax you can use stacker to dynamically allocate more stack at each recursive call. This is what charon and rustc use:

// This is the amount of bytes that need to be left on the stack before increasing the size.
// It must be at least as large as the stack required by any code that does not call
// `ensure_sufficient_stack`.
const RED_ZONE: usize = 100 * 1024; // 100k

// Only the first stack that is pushed, grows exponentially (2^n * STACK_PER_RECURSION) from then
// on. This flag has performance relevant characteristics. Don't set it too high.
const STACK_PER_RECURSION: usize = 1024 * 1024; // 1MB

/// Grows the stack on demand to prevent stack overflow. Call this in strategic locations
/// to "break up" recursive calls. E.g. almost any call to `visit_expr` or equivalent can benefit
/// from this.
///
/// Should not be sprinkled around carelessly, as it causes a little bit of overhead.
#[inline]
pub fn ensure_sufficient_stack<R>(f: impl FnOnce() -> R) -> R {
    stacker::maybe_grow(RED_ZONE, STACK_PER_RECURSION, f)
}

If it's in bincode then you should report it upstream, that sounds like something they should be handling gracefully (and they can use stacker too).

Nadrieril avatar Oct 16 '24 17:10 Nadrieril

Yeah, we wanted to use stacker as well, thanks for the tip! :) But there are a number of odd things around this bug, we need to understand exactly what's going on there! And I'd like to address #1001 in the same time :)

W95Psp avatar Oct 16 '24 17:10 W95Psp