fuels-rs icon indicating copy to clipboard operation
fuels-rs copied to clipboard

Arbitrary nesting of heap types in contract returns

Open digorithm opened this issue 2 years ago • 2 comments

Discussed in https://github.com/FuelLabs/fuels-rs/discussions/944

Originally posted by segfault-magnet April 26, 2023 @hal3e and I have been discussing something, details to follow:

Currently, the Rust SDK cannot handle a nested heap type (e.g. -> Option<Vec<u64>>) in the return type of a contract/script. The limitation stems from the fact that returning heap types only returns pointers which are useless without the VM's memory.

A partial workaround was implemented where we would inject some extra bytecode after the contract call. The injected bytecode would generate a ReturnData receipt containing the previously inaccessible heap data. This allowed for returning a single non-nested heap type (e.g. -> Vec<u64>). The injected bytecode is currently incapable of handling anything more complex.

We suggest taking this to the extreme and supporting arbitrarily nested heap types.

There are three main problems we need to solve: structs, enums, and heap types nested in other heap types. Let's start with structs:

Structs

Let's consider the following example:

struct AnotherStruct {
  a: u64,
  b: Vec<u64>
}

struct SomeStruct { 
  a: u64,
  b: u64,
  c: AnotherStruct
}
// ... some contract method below
fn foo() -> SomeStruct {
    SomeStruct {
        a: 10,
        b: 11,
        c: AnotherStruct {
            a: 12,
            b: vec![13, 14, 15],
        },
    }
}

Encoded it looks something like this Notepad1_26

In order to support this we would need to inject a retd instruction that will yank the missing heap data into a separate receipt.

The only difference between this and our current workaround is that the vector pointer isn't encoded immediately at the start. We would need to calculate this offset and add it to the address in the RET register in order to yank the heap data.

This can be done easily since we already know the exact size of every part of the struct.

In this case, it doesn't matter how deep the vector is inside nested structs, a single retd with a correct offset can get its data.

Enums

Let's consider the following example:

struct WrappingStruct {
    a: u64,
    b: Vec<u64>,
}

enum TheEnum {
    A(u64),
    B(WrappingStruct),
}

fn foo() -> TheEnum {
    TheEnum::B(WrappingStruct {
        a: 10,
        b: vec![11, 12, 13],
    })
}

Notepad1_27

With enums, we need to inject bytecode that will branch depending on the enum discriminant. Variants that don't contain heap types don't need to be considered, e.g. our example will only check if the enum carries the second variant.

After the variant is determined, the enum data can be used to generate the additional receipt data as before.

Heap types in other heap types

struct WrappingStruct {
    a: u64,
    b: Vec<u64>,
}

struct ParentStruct {
    a: u64,
    b: Vec<WrappingStruct>,
}

fn foo() -> ParentStruct {
    ParentStruct {
        a: 9,
        b: vec![
            WrappingStruct {
                a: 10,
                b: vec![11, 12, 13],
            },
            WrappingStruct {
                a: 14,
                b: vec![15, 16, 17],
            },
        ],
    }
}

Notepad1_28 In order to get all the necessary data we'd need to issue 3 retd instructions.

Notice that this issue is recursive in nature. Once we issue the first and simplest retd we're right back to the same problem only now the starting address has been moved along.

After collecting the receipts

Due to the structured and deterministic way we've approached walking the type tree (a post-order transversal) we can now use the extra receipts to decode the return type.

Since the abi encoder also does post-order transversing we can adapt it to accept a stack of receipts, popping one each time a heap type is to be decoded.

Or we can merge all the receipts into one, taking care to update the pointers to point to their respective data. Decoding would then be trivial as though we had the VMs memory loaded.

But that is an implementation detail, not that relevant right now. The point is, decoding is possible.

Other technical challenges also exist, such as registry management, but nothing unsolvable as far as we can see.

Considerations

  • The indexer They will not be able to use this every time, just as they weren't able to use the current injection approach. If the contract wasn't called directly through the SDK then the bytecode was never injected and the additional receipts were never generated. They could handle returning raw or typed slices but nested heap types would not be possible then.

  • Logging Logging would not be supported since we cannot inject bytecode at the appropriate places. Logging heap types will only be possible through typed slices without the support of nested heap types.

  • Script support We need to investigate the possibility of injecting the bytecode at the end of user-provided scripts. If it transpires that we'll always have a deterministic way of reaching the script return data, then the approach will be the same as for contracts.

digorithm avatar Jul 17 '23 15:07 digorithm

Waiting to see how far away encoding support is in sway.

segfault-magnet avatar Aug 19 '23 04:08 segfault-magnet

Hi there! A user is facing an incidence of this issue.

cold-briu avatar Nov 06 '23 18:11 cold-briu

Closed by https://github.com/FuelLabs/fuels-rs/pull/1303.

digorithm avatar Apr 04 '24 13:04 digorithm