move icon indicating copy to clipboard operation
move copied to clipboard

[Bug] Move prover bytecode translator error

Open awelc opened this issue 2 years ago • 25 comments

🐛 Bug

I have discovered a problem with the bytecode translator on certain snippets of Move code (please see below for an example on how to reproduce this). I have done some investigation, and I am happy to continue digging, but if someone more familiar with the codebase could offer some help or advice, it would be great.

The problem manifests itself in the Boogie backend in bytecode_translator.rs:1632 - the inst vector has length 0:

                    let (read_aggregate, update_aggregate, elem_ty) = match dst_ty {
                        Type::Vector(et) => ("ReadVec", "UpdateVec", et.as_ref().clone()),
                        // If its not a vector, we assume it is the Table type.
-->                     Type::Struct(_, _, inst) => ("GetTable", "UpdateTable", inst[1].clone()),
                        _ => unreachable!(),
                    };

The reason for this is that BorrowEdge::Index edge type is used to indicate borrowing both from a vector and from a table, and in this case it is incorrectly identified as the edge indicating borrowing from a table though no tables are used in the code.

I am pretty sure that the problem is in the pass responsible for borrow analysis, and I suspect it is related to how hyper edges summarize and propagate borrow information of entire functions to callers of these functions.

When you look at the output produced by borrow analysis pass for the inner function in the example below, you will see borrow information of this kind (notice index edge indicated by []):

# borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>), Reference($t1))}, Reference($t1) -> {([], Reference($t3))}

It makes perfect sense that as the index edge represents borrowing from a vector due to $t1 and $t3 types:

var $t1: &mut vector<m1::HasAnotherVector>
var $t3: &mut m1::HasAnotherVector

In the output for the outer function, we see something a bit different:

# borrowed_by: Reference($t0) -> {([]/.v (vector<m1::HasAnotherVector>), Reference($t2))}

I haven't yet figured out how hyper edges are used in the translation process, but the above may imply that $t0 and $t2 will be connected via an index edge eventually, which would be a problem due to their types, none of which is a vector:

fun m1::outer($t0|has_vector: &mut m1::HasVector) { ... }
var $t2: &mut m1::HasAnotherVector

I am not 100% sure that this is indeed a problem (and that's where some advice would be helpful), particularly that a similar information in the mid function, with the edges in hyper edge reordered, does not seem to be triggering this problem (if you remove the mid function and call inner function from outer directly, the problem will disappear):

# borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>)/[], Reference($t1))}

This is with $t0 and $t1 types as follows:

fun m1::mid($t0|has_vector: &mut m1::HasVector): &mut m1::HasAnotherVector { ... }
var $t1: &mut m1::HasAnotherVector

To reproduce

This is the code that causes the prover to fail:

module prover_bug::m1 {
    use std::vector;

    struct HasVector {
        v: vector<HasAnotherVector>,
    }

    struct HasAnotherVector {
        v: vector<u8>,
    }


    fun outer(has_vector: &mut HasVector) {
        let has_another_vector = mid(has_vector);
        vector::push_back(&mut has_another_vector.v, 42)
    }

    fun mid(has_vector: &mut HasVector): &mut HasAnotherVector {
        inner(has_vector)
    }

    fun inner(has_vector: &mut HasVector): &mut HasAnotherVector {
        vector::borrow_mut(&mut has_vector.v, 7)
    }
}

awelc avatar Jul 16 '22 21:07 awelc

particularly that a similar information in the mid function, with the edges in hyper edge reordered, does not seem to be triggering this problem

Why does removing the mid function cause the edges in the hyper edge to be reordered though? The edge order after with mid removed seems to be the correct order, i.e. (.v (vector<m1::HasAnotherVector>)/[], which is field followed by index. I think there might have been somewhere in writeback that wrongly reorders the edges in a hyper edge, that causes this issue...

emmazzz avatar Jul 16 '22 21:07 emmazzz

Why does removing the mid function cause the edges in the hyper edge to be reordered though?

What I think is happening is that when hyper edges are propagated inner to its caller, their sub-eges will always be in (presumably) correct order. It's another level of propagation that seems to be the problem and reorders sub-edges (presumably) incorrectly.

awelc avatar Jul 16 '22 21:07 awelc

This sounds crazy but if you add another layer, the issue goes away...

/// This module defines a minimal Coin and Balance.
module NamedAddr::BasicCoin {
    use std::vector;

    struct HasVector {
        v: vector<HasAnotherVector>,
    }

    struct HasAnotherVector {
        v: vector<u8>,
    }


    fun outer(has_vector: &mut HasVector) {
        let has_another_vector = mid(has_vector);
        vector::push_back(&mut has_another_vector.v, 42)
    }

    fun mid(has_vector: &mut HasVector): &mut HasAnotherVector {
        inner(has_vector)
    }

    fun inner(has_vector: &mut HasVector): &mut HasAnotherVector {
        innermost(has_vector)
    }

    fun innermost(has_vector: &mut HasVector): &mut HasAnotherVector {
        vector::borrow_mut(&mut has_vector.v, 7)
    }
}

emmazzz avatar Jul 16 '22 21:07 emmazzz

I wonder if there's anywhere where we reverse the order of edges by doing something like popping elements from the back and inserting them in the reverse order.

emmazzz avatar Jul 16 '22 21:07 emmazzz

There's a place where we reverse the path when constructing a hyper edge: https://github.com/move-language/move/blob/main/language/move-prover/bytecode/src/borrow_analysis.rs#L271. I suspect there's something off here.

emmazzz avatar Jul 16 '22 22:07 emmazzz

The more I think about it, the more it seems to me like the fact that the problem occurs under one order and not the other is not the real reason. I wonder if having these index edges (seemingly, as I am not sure about that) connecting non-vector type nodes in the propagated hyper edges is a real problem...

awelc avatar Jul 16 '22 22:07 awelc

having these index edges (seemingly, as I am not sure about that) connecting non-vector type nodes in the propagated hyper edges is a real problem

It is a problem and it's caused by edges being wrongly reordered. The borrow edge from t0 (&mut HasVector) to t2 (&mut HasAnotherVector) should be: (1) field borrow t0.v to get t1 (&mut vector<HasAnotherVector>), (2) index borrow t1[7] to get t2. But somehow having another layer of propagation reverses this order.

emmazzz avatar Jul 16 '22 22:07 emmazzz

To follow up on my line of thought (which is very possibly wrong), in inner you have:

borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>), Reference($t1))}, Reference($t1) -> {([], Reference($t3))}

Which has:

Reference($t0) -> {(.v (vector<m1::HasAnotherVector>), Reference($t1))},

and

Reference($t1) -> {([], Reference($t3))}

And this seems to be summarized in mid to

borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>)/[], Reference($t1))}

We somehow loose info about Reference($t1) -> {([], Reference($t3))} from inner in the process.

awelc avatar Jul 16 '22 22:07 awelc

I think the $t1 in inner and mid are different things? In inner, $t1 is of type vector<HasAnotherVector>, and is the field of $t0. In mid, $t1 is the return value of mid, which is the $t3 in inner.

We somehow loose info about Reference($t1) -> {([], Reference($t3))} from inner in the process.

IIUC this information is summarized in the hyper edge Reference($t0) -> {(.v (vector<m1::HasAnotherVector>)/[], Reference($t1))}, which has both field edge and index edge in (.v (vector<m1::HasAnotherVector>)/[].

This is my understanding, which might be wrong. Time to tag @wrwg to shed some light here.

emmazzz avatar Jul 16 '22 22:07 emmazzz

I think I was not clear enough :-( In inner you have:

Reference($t0) -> {(.v (vector<m1::HasAnotherVector>), Reference($t1))},

where $t1: &mut vector<m1::HasAnotherVector> and $t0|has_vector: &mut m1::HasVector, and we have:

Reference($t1) -> {([], Reference($t3))}

where $t3: &mut m1::HasAnotherVector

In the second case, it makes sense to have index edge there as $t1 is a vector type.

On the other hand, in mid we have:

borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>)/[], Reference($t1))}

which kind of summarizes the above, but not quite because neither $t0 nor $t1 in mid is a vector type: $t0|has_vector: &mut m1::HasVector and $t1: &mut m1::HasAnotherVector

awelc avatar Jul 16 '22 22:07 awelc

neither $t0 nor $t1 in mid is a vector type: $t0|has_vector: &mut m1::HasVector and $t1: &mut m1::HasAnotherVector

Right, but $t0 has a field v that is a vector type. And I think .v (vector<m1::HasAnotherVector>) means we are taking field .v to get a vector<m1::HasAnotherVector>.

emmazzz avatar Jul 16 '22 22:07 emmazzz

neither $t0 nor $t1 in mid is a vector type: $t0|has_vector: &mut m1::HasVector and $t1: &mut m1::HasAnotherVector

Right, but $t0 has a field v that is a vector type. And I think .v (vector<m1::HasAnotherVector>) means we are taking field .v to get a vector<m1::HasAnotherVector>.

You may be right! I am groping in the dark here a bit :-) I was thinking that the two parts of the hyper edge (divided by /) are independent in a way, meaning that if we "expand" it we'd have something like Reference($t0) -> {([]BLAH, Reference($t1))} in mid which would be wrong due to missing vector type.

You are right that it's probably time for @wrwg to provide some additional context and/or advice here!

awelc avatar Jul 16 '22 22:07 awelc

Have you debug print the dst_ty in the match dest_ty which causes the issue (sorry if I missed it)?

wrwg avatar Jul 17 '22 04:07 wrwg

Have you debug print the dst_ty in the match dest_ty which causes the issue (sorry if I missed it)?

I have but did not include it as it's not very illuminating in this context, providing only ModuleId and StructId:

DST TYPE: Struct(ModuleId(1), StructId(Symbol(103)), [])

As there are only 2 structs in the minimal example, it's most likely either HasVector or HasAnotherVector.

I am fairly confident that the problem is that index edge somehow sneaks in to connect nodes representing structs (but not tables) rather than vectors.

awelc avatar Jul 17 '22 04:07 awelc

Have you debug print the dst_ty in the match dest_ty which causes the issue (sorry if I missed it)?

dst_ty is HasVector.

@awelc you can print the type with TypeDisplayContext and it will give you better looking information!

let ty_ctx = 
    &TypeDisplayContext::WithEnv {
        env: self.parent.env,
        type_param_names: None,
    };
println!("dst_ty is {}", dst_ty.display(ty_ctx));

emmazzz avatar Jul 17 '22 04:07 emmazzz

Even easier, use https://github.com/move-language/move/blob/750903126278ae325461b8c5c9d24f1c708965f5/language/move-model/src/model.rs#L1681

wrwg avatar Jul 17 '22 04:07 wrwg

I wonder what the priority of this (clear) bug is. Is this important for you? If so do you not have a reasonable workaround?

wrwg avatar Jul 17 '22 04:07 wrwg

I wonder what the priority of this (clear) bug is. Is this important for you? If so do you not have a reasonable workaround?

The minimal example was distilled from one of our real-life examples. One possible workaround is to use another level of indirection, but since we are starting to actually integrate the prover into our flavor of Move, it would be good to get a proper fix.

I am happy to continue working on this myself, but it would be good to get some pointers if possible. At this point we have two hypotheses (@emmazzz, please correct me if I am wrong):

  • the order of sub-edges in a hyper edge is the main reason for the failure and otherwise the borrow info is correct
  • the borrow info in the hyper edges is incorrect as it contains an index sub-edge between types that do not include a vector type

Knowing if any of these is correct would be a good start. I realize that this thread is a big long, but skipping the first few comments and starting with this one perhaps would help: https://github.com/move-language/move/issues/270#issuecomment-1186309265

awelc avatar Jul 17 '22 05:07 awelc

I dug a little deeper for learning purposes but also since I want this out of the way.

The comment for summarize function in borrow analyzer says the following:

    /// Collect those leaves which are returned and summarize them in a hyper edge.
    /// Each of those leaves has a path `in_mut -> ref1 .. -> refn -> out_mut`.
    /// We create a hyper edge `in_mut --summarize(ref1, .., refn)-> out_mut` for it.

And I believe that this is what this function actually does, but I am not sure if the output is actually correct. In the inner function (in the example) you have:

borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>), Reference($t2))}, Reference($t2) -> {([], Reference($t4))}

This gets summarized to:

borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>)/[], Return(0))}

(in inner function $t0 is &mut m1::HasVector and the return type of this function is &mut m1::HasAnotherVector)

Which in the mid function ends up being the following (with the node type translated accordingly to reflect types as defined in mid):

borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>)/[], Reference($t1))}

(in mid function $t0 is &mut m1::HasVector1 and $t1 is &mut m1::HasAnotherVector)

In some sense, we loose information here, as we no longer know that an path from Reference($t0) to Reference($t1) goes through a node representing a vector.

And that's when things get a bit weird, that is that's where the order of edges captured in the hyper edge actually matters. The hyper edge becomes a part of the WriteBack operation and when the WriteBack operation is being translated in the Boogie backend's bytecode translator (translate_writeback function), only the first sub-edge of the hyper edge that it contains is taken into consideration.

This is why the order of sub-edges in the hyper edge matters - if we are lucky and hit the field edge all is good, but if we are unlucky and the first edge is an index edge then we have a problem.

It would be good to know why only one sub-edge from all those contained in the hyper edge is considered there (@wrwg?).

awelc avatar Jul 18 '22 23:07 awelc

The hyper edge becomes a part of the WriteBack operation and when the WriteBack operation is being translated in the Boogie backend's bytecode translator (translate_writeback function), only the first sub-edge of the hyper edge that it contains is taken into consideration.

I think the later sub-edges are taken into consideration too. translate_write_back_update is a recursive function. After we are done with the first sub-edge, we call translate_write_back_update with incremented index at + 1, such as here: https://github.com/move-language/move/blob/c9bce5832193fef28dfd680b6f91c5ced15e0ff2/language/move-prover/boogie-backend/src/bytecode_translator.rs#L1611-L1621.

emmazzz avatar Jul 19 '22 00:07 emmazzz

I think the later sub-edges are taken into consideration too. translate_write_back_update is a recursive function.

Right... This is some complicated code! To continue the analysis, though, if we have field edge followed by index edge, the new type destination is computed based on the field type and recursively passed to translate_write_back_update. That's why in the following case there is no error ('mid` function):

borrowed_by: Reference($t0) -> {(.v (vector<m1::HasAnotherVector>)/[], Reference($t1))}

If index edge is first, though, translate_write_back_update will receive the existing original (non-vector) destination type, which will crash the prover in this case (outer function):

borrowed_by: Reference($t0) -> {([]/.v (vector<m1::HasAnotherVector>), Reference($t2))}

awelc avatar Jul 19 '22 01:07 awelc

If index edge is first, though, translate_write_back_update will receive the existing original (non-vector) destination type, which will crash the prover in this case (outer function):

Exactly. The correct order should be [field, index] for both mid and outer. We need to figure out why index edge is the first sub-edge in outer's case, and correct it.

emmazzz avatar Jul 19 '22 01:07 emmazzz

I think I finally figured out where the bug is after running the algorithm through some examples on paper. I will try to put together a PR tomorrow to see if my idea makes any sense.

awelc avatar Jul 20 '22 02:07 awelc

@wrwg, @emmazzz - I have created a PR where I attempted to fix the problem: https://github.com/move-language/move/pull/289 (please PR's description for details).

awelc avatar Jul 21 '22 18:07 awelc

It is fantastic to see we have now two proposals to fix this (#289 and #298 from @meng-xu-cs)! I'm still quite a bit under water, do you folks agree which is the right fix, or should I look closer?

BTW I haven't paid attention lately but we do run all of the old DPN to check for regressions and both fixes (amazingly) pass this test, right?

wrwg avatar Jul 23 '22 19:07 wrwg

Fixed in https://github.com/move-language/move/pull/298

awelc avatar Aug 02 '22 22:08 awelc