ink Macro based storage rework

Most of the existing storage types were removed because they are simply to inefficient in size and executed instructions. Right now we only have Mapping. However, this type doesn't work very well with the existing storage infrastructure and needs a band aid named SpreadAllocate and initialize_contract.

We want to pivot more in the direction of how storage works in FRAME. We should keep the overarching contract data structure but use the storage macro to generate the supporting code instead of a elaborate system of traits.

This would also allow us more flexibility in how to generate the storage keys for these data structures and more control over what code is generated. The aim is to reduce the contract sizes by removing monomorphization and copying of the root keys.

Please note that the following is only a sketch of the generated code. It is probably not how the generated code should look like. @Robbepop noted that we should not use private sub modules. Please feel free to post your suggestions and I will update the top post.

We add a new trait that will be implemented by the storage macro to supply our storage traits with a key. Our storage traits have default implementations for every function. The macro will only set the associated types.

trait HasKey {
    const KEY: [u8; 32];
}

trait StorageValue: HasKey {
    type Value: Encode + Decode;
    
    // we take self here to build in the mutability levels
    fn write(&mut self, value: &Value) {
        ink_env::set_contract_storage(Self::KEY, value.encode());
    }
    
    fn read(&self) -> Self::Value {
        ink_env::get_contract_storage(Self::KEY)
    }
}

trait StorageMap: HasKey {
    type Key: Hashable;
    type Value: Encode + Decode;
    
    fn insert(&mut self, key: Self::Key, &value: Self::Value) {
        ink_env::set_contract_storage(twoxx(Self::KEY ++ key), value.encode());
    }
    
    fn get(&self, key: Self::Key) -> Self::Value {
        ink_env::get_contract_storage(twoxx(Self::KEY ++ key));
    }
}

Then the storage macro creates a private module where it dumps all the generated code:

#[ink(storage)]
struct Contract {
    a: u8,
    b: Mapping<AccountId, Balance>,
}

// The macro would transfrom the struct into this:
// The underscores are just to prevent clashes with user defined modules
// Having public fields in the struct would be an error.
mod __private__ {
    pub struct AFieldStorageValue {
        /// prevent instantation outside of this generated code
        _private: ()
    }
    
    pub struct BFieldMapping {
        /// prevent instantation outside of this generated code
        _private: ()
    }
    
    impl HasKey for AFieldStorageValue {
        // hash of "a: AFieldStorageValue"
        const KEY: [u8; 32] = b"3A9ADFE4234B0475EB1767ACD1BCFA75D7B9E0BA1C427FBAF0F476D181D3A820";
    }
    
    impl HasKey for BFieldMapping {
        // hash of "b: BFieldMapping"
        const KEY: [u8; 32] = b"33D27C398EEEA3851B750A1152FF9B63B1D6D10BE18E2D58A745776D9F2FF241";
    }
    
    // the functions all have default operations. We just need to set the types.
    impl StorageValue for AFieldStorageValue {
        type Value = u32;
    }
    
    // the functions all have default operations. We just need to set the types.
    impl StorageMap for BFieldMapping {
        type Key = AccountId;
        type Value = Balance;
    }
    
    // fields are private cannot be contstructed by user
    pub struct Contract {
        a: AFieldStorageValue,
        b: BFieldMapping,
    }
    
    impl Contract {
        // we generate this but leave out any ink! defined collection types
        // initialization is and always be the crux of the matter. I would
        // suggest forcing the user to initialize everything here but
        // collections. So every user defined type would be wrapped into
        // a `StorageValue` and added here. The rest are ink! collections
        // which might or might not be skipped here.
        pub fn initialize(a: u32) -> Self {
            let ret = Self {
                a: AFieldStorageValue { _private: () },
                b: BFieldMapping { _private: () },
            };
            ret.a.write(a);
            ret
        }
        
        // we create accessors for each field with the correct mutability
        // this way we keep the natural way of marking messages as
        // mutable and getters
        pub fn a(&self) -> &AFieldStorageValue { return &self.a };
        pub fn b(&self) -> &BFieldMapping { return &self.b };
        pub fn a_mut(&mut self) -> &mut AFieldStorageValue { return &mut self.a };
        pub fn b_mut(&mut self) -> &mut BFieldMapping { return &mut self.b };
    }
}

use __private__::Contract;

impl Contract {
    // Should we have the `constructor` macro write the `Contract` on `Drop`?
    #[ink(constructor)]
    pub fn new() -> Self {
        // user cannot fuck up because it is the only constructor available
        Self::initialize(42)
    }

    #[ink(message)]
    pub fn do_writes(&mut self) {
        self.a_mut().write(1);
        self.b_mut().insert(ALICE, 187334343);
    }

    #[ink(message)]
    pub fn do_reads(&self) {
        let a = self.a().read();
        let b = self.b().get(ALICE);
    }

    #[ink(message)]
    pub fn do_reads_and_writes(&mut self) {
        // just works
        self.a_mut().write(1);
        self.b_mut().insert(ALICE, 187334343);
        let a = self.a().read();
        let b = self.b().get(ALICE);
    }
}

The biggest downside of this approach is the same as substrate's: New storage collections always need changes to be made in codegen in order to support them. However, I assume this will safe us a lot of code passing around all those keys and stateful objects (lazy data structures) at runtime. It is also much simpler and easier to reason about (because it has less features).

Open Questions

How does this affect/play with things like:

Multi-file contracts?
ink-as-dependency contracts
ink! traits

Feb 16 '22 09:02 athei

I played with that idea, tested different implementations and cases, checked the sizes.

First of all to reduce the size, better to avoid frequent usage of ink_env::set_contract_storage and ink_env::get_contract_storage.

Instead of:

#[ink(storage)]
pub struct Flipper {
    value: Storage<bool>,
    value2: Storage<bool>,
    value3: Storage<u32>,
    value4: Storage<u32>,
}

Better to:

#[ink(storage)]
pub struct Flipper {
    value: Storage<Test>,
}

struct Test {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

Or

#[ink(storage)]
pub struct Flipper {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

The second point - introduction of accessors like:

// we create accessors for each field with the correct mutability
// this way we keep the natural way of marking messages as
// mutable and getters
pub fn a(&self) -> &AFieldStorageValue { return &self.a };
pub fn b(&self) -> &BFieldMapping { return &self.b };
pub fn a_mut(&mut self) -> &mut AFieldStorageValue { return &mut self.a };
pub fn b_mut(&mut self) -> &mut BFieldMapping { return &mut self.b };

Will be very unclear for the user + we will have the problem with highlighting in IDE.

Based on the results I want to propose the next:

Fields of the contract should be only scale::Decode and scale::Encode(We are removing traits: SpreadLayout, PackedLayout, SpreadAllocate, PackedAllocate).
The whole contract is stored under the storage key ContractRootKey::ROOT_KEY in one cell -> Fields don't have their own storage key by default. The contract is one monolith structure.
All structures that plan to be part of the storage should be defined via #[ink(storage)](I will describe later why).
ink! provides Mapping<K, V, Mode = AutoStorageKey>, Storage<T, Mode = AutoStorageKey> and other types to allow the user to manage storage keys. The keys will be defined during compilation time(The main idea of that issue).

ink! will provide types and structures that allow moving some fields into their own cell. These types will have two modes:

ManualStorageKey<const KEY: u128> - it allows to specify the storage key of the type. It is u128 because it is the maximum allowed const type. In the code it is converted into [u8; 32] via core::mem::transmute::<(u128, u128), [u8; 32]>((KEY, 0)) (Maybe you have a better idea? I tried the idea with created private types but the size of the contract was more. With const it showed better results).
AutoStorageKey - it is default mode. It means that #[ink(storage)] macro will generate the key based on the NAME_OF_THE_STRUCTURE::NAME_OF_THE_FIELD. We will replace the type of Storage<T, AutoStorageKey> with a new one Storage<T, ManualStorageKey<hash!("NAME_OF_THE_STRUCTURE::NAME_OF_THE_FIELD")>>. To recognize that the type supports IsAutoStorage<const KEY: u128> trait we will use autoref feature. If it is IsAutoStorage<const KEY: u128> we use IsAutoStorage<const KEY: u128>::StorageType else we use previous one.

That solution provides control over the storage keys. The developer can decide by himself that key use and where. The solution introduces only one limitation - all storage-related structures should be defined via the #[ink(storage)] macro. The main storage of the contract can be marked with #[ink(contract)] or #[ink(storage(main))] or something like that.

That solution doesn't affect multi-files because everyone will use that rules to work with storage -> structures are marked with #[ink(storage)] and implements scale::Decode + scale::Encode. It also doesn't affect traits and ideas used in OpenBrush(We only need to rewrite the storage struct according to new rules). I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.

// `value` and `value2` are `Test2`. All inner storage keys are the same -> self.value.get().value4.set(&144) will also affect self.value2.get().value4
#[ink(storage)]
pub struct Flipper {
    value: Storage<Test2, ManualKey<1>>,
    value2: Storage<Test2, ManualKey<4>>,
}

#[ink(storage)]
struct Test2 {
    value: Storage<bool>,
    value2: Storage<bool>,
    value3: Storage<u32>,
    value4: Storage<u32>,
}

That problem has a workaround with manual offset:

#[ink(storage)]
pub struct Flipper {
    value: Storage<Test2<100>, ManualKey<100>>,
    value2: Storage<Test2<200>, ManualKey<200>>,
}

#[ink(storage)]
struct Test2<const Offset: u128> {
    value: Storage<bool, ManualKey<{ Offset + 1 }>>,
    value2: Storage<bool, ManualKey<{ Offset + 2 }>>,
    value3: Storage<u32, ManualKey<{ Offset + 3 }>>,
    value4: Storage<u32, ManualKey<{ Offset + 4 }>>,
}

But then we need to return an error if the storage layout contains two same keys to prevent the user from bugs=)

Below the example of the code that I used for testing(It is only MVP). The same idea can be implemented in Mapping.

#[derive(Default)]
struct AutoKey;
#[derive(Default)]
struct ManualKey<const KEY: u128>;

#[derive(Default)]
struct Storage<T: Decode + Encode, Mode = AutoKey>(PhantomData<(T, Mode)>);

#[cfg(feature = "std")]
impl<T: Decode + Encode + scale_info::TypeInfo + 'static, Mode> ::ink_storage::traits::StorageLayout for Storage<T, Mode>
where Self: KeyTrait {
    fn layout(_key_ptr: &mut KeyPtr) -> Layout {
        Layout::Cell(CellLayout::new::<T>(LayoutKey::from(
            &<Self as KeyTrait>::key(),
        )))
    }
}

trait KeyTrait {
    fn key() -> Key;
}

trait StorageTrait: KeyTrait {
    type Value: Decode + Encode;

    fn get(&self) -> Self::Value {
        ink_env::get_contract_storage(&Self::key()).unwrap().unwrap()
    }

    fn set(&self, value: &Self::Value) {
        ink_env::set_contract_storage(&Self::key(), &value);
    }
}

impl<T: Decode + Encode, const KEY: u128> KeyTrait for Storage<T, ManualKey<KEY>> {
    #[inline]
    fn key() -> Key {
        Key::from(unsafe { core::mem::transmute::<(u128, u128), [u8; 32]>((KEY, 0)) })
    }
}

impl<T: Decode + Encode, const KEY: u128> StorageTrait for Storage<T, ManualKey<KEY>> {
    type Value = T;
}

impl<T: Decode + Encode> KeyTrait for Storage<T, AutoKey> {
    #[inline]
    fn key() -> Key {
        // Here we will put `__ink_error...`
        panic!()
    }
}

impl<T: Decode + Encode> StorageTrait for Storage<T, AutoKey> {
    type Value = T;
}

trait IsAutoKey<const KEY: u128> {
    type StorageType;
}

impl<T: Decode + Encode, const KEY: u128> IsAutoKey<KEY> for Storage<T, AutoKey> {
    type StorageType = Storage<T, ManualKey<KEY>>;
}

Feb 16 '22 21:02 xgreenx

The whole contract is stored under the storage key ContractRootKey::ROOT_KEY in one cell -> Fields don't have their own storage key by default. The contract is one monolith structure.

Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.

In your examples I don't see the overarching main contract structure that ties all of that together.

I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.

This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.

Also, would be nice to have the example with Mapping instead of Storage. The latter is too simple to understand what is happening there.

Feb 17 '22 16:02 athei

Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.

I didn't implement ContractRootKey::ROOT_KEY logic in the example because it is part of #[ink::contract] macro. All fields implement Decode + Encode. To load a contract we will use ink_env::get_contract_storage(&ContractRootKey::ROOT_KEY), to flush - ink_env::set_contract_storage(&ContractRootKey::ROOT_KEY, &contract.encode()).

The idea is that the developer for some type will create an empty implementation of Decode and Encode. For example Storage or Mapping. That types work with storage directly.

So all atomic types are loaded with contract. Other types doesn't require loading.

In your examples I don't see the overarching main contract structure that ties all of that together.

I will prepare more detailed example=)

This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.

Seems I find the solution for that problem. const_generics_defaults feature is stable now, and we can create strcutures like:

struct Test2<const KEY: u128 = 0> {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

So our #[ink(storage)] macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.

Also, would be nice to have the example with Mapping instead of Storage. The latter is too simple to understand what is happening there.

Will try to add an example for Mapping too. The problem that example already is big=)

Feb 17 '22 22:02 xgreenx

Seems I find the solution for that problem. https://github.com/rust-lang/rust/issues/44580#issuecomment-991782799 feature is stable now, and we can create strcutures like:
struct Test2<const KEY: u128 = 0> {
   value: bool,
   value2: bool,
   value3: u32,
   value4: u32,
}
So our #[ink(storage)] macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.

No matter, we can't use that KEY to do calculations like { KEY + 2 } during the declaration. It is possible with an unstable feature #![feature(generic_const_exprs)], but we want to be stable=)

I will prepare the simple version only with a manual key and with generated.

This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.

I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.

Feb 18 '22 01:02 xgreenx

I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.

If we can catch this during compilation then it is fine. Would be better to catch it during the compilation of the wasm, though.

The idea is that the developer for some type will create an empty implementation of Decode and Encode. For example Storage or Mapping. That types work with storage directly.

So all atomic types are loaded with contract. Other types doesn't require loading.

This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap simply implement an empty Encode + Decode. Just to reiterate: What we want to achieve with this rewrite is:

Simplify this whole thing and drop features we don't need (like removing all the traits you mentioned).
- These are mainly used to determine which values are packed together into a single cell and where designed to only work with lazy data structures (that write on Drop)
Getting rid of initialize_contract: This is done by generating keys through macros and not by passing them down via parameters
Reducing contract size

I have some further questions:

Would it be legal to nest those storage types? For example, to allow a Storage with the value of a Mapping? I don't think this makes sense, right?
In all your examples you just a numeric key (u128). Would the same type also be used for the automatic key? I think this would be beneficial as it is much smaller as concatenating strings. Could we just use the position within the struct? Like having a counter inside the proc macro? This is how it is currently working (this counter is passed down as argument). We could even use smaller type there then (like u32).
How would you combine the root key of a mapping (a numeric value) with the Key of the Mapping itself? Do we require Display for the Key and just concat (remember we don't want to hash so that the key is transparent).

Feb 18 '22 12:02 athei

I managed to implement auto-incrementing of the storage key during compilation(it is a hard variant where we don't need to worry about keys)=) Here is an example. It contains only an example with Storage at the moment, but the Mapping is the same in the implementation. There is ugly code, but it will be generated by the macro. In the comments, I describe how it should look like in the contract. I added Debug everywhere to output the keys, so please ignore that stuff. That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?

This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap simply implement an empty Encode + Decode. Just to reiterate: What we want to achieve with this rewrite is

Nope!=) Currently, we are storing each field under the own storage key. If we put all fields under ink_storage::Pack<T> then we will store everything under one key, but currently, it is not. I compiled contracts with ink_storage::Pack<T> and they take much less memory, so It is why I'm proposing to store all fields under the one key by default because it is the most performant strategy in terms of the contract's size.

If we generate storage keys during compilation, then the empty implementation of Encode + Decode will not affect the size of the contract.

Would it be legal to nest those storage types? For example, to allow a Storage with the value of a Mapping? I don't think this makes sense, right?

Usage like Storage<Mapping<Key, Value>> doesn't make sense. But Storage<Struct> where Struct contains many fields and some of that fields are Mapping or Storage is a valid case.

In all your examples you just a numeric key (u128). Would the same type also be used for the automatic key? I think this would be beneficial as it is much smaller as concatenating strings. Could we just use the position within the struct? Like having a counter inside the proc macro? This is how it is currently working (this counter is passed down as argument). We could even use smaller type there then (like u32).

Yeah, I want to propose using some integer as a storage key, because in terms of the contract we don't need to worry about the collision of fields. Yes, in the hard version that I posted it is auto-incremented.

How would you combine the root key of a mapping (a numeric value) with the Key of the Mapping itself? Do we require Display for the Key and just concat (remember we don't want to hash so that the key is transparent).

Storage key implements Encode and key of Mapping implements it. As we discussed with you in the chat, it can be done with a new seal_get_storage it can be done like:

let key = (&Mapping::StorageKey, &Key).encode().as_ref();
let value = value.encode().as_ref();
seal_get_storage(key, key.len(), value, value.len())

seal_get_storage will hash it by self

Feb 18 '22 15:02 xgreenx

That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?

I mentioned that issue: https://github.com/rust-lang/rust/issues/26925

But seems we can handle it. The derive macro can be expanded before the #[ink(storage)] macro, so we can add the generic for numeration later and it will not have conflicts with#[derive(...)]. It works because of that change: https://github.com/rust-lang/rust/pull/79078 Our macro can check: Does exist at least one derive macro after him? if yes - our macro will put himself after the derive macro.

Feb 20 '22 22:02 xgreenx

That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?

You are meaning the one on your Test and Flipper structs? Bounds are merely a heuristic and often wrong. It can detect PhantomData but in your case the PhantomData is burried deep in other types so the macro cannot see it. This is why we have attributes in parity-scale-codec for its macros to overwrite bounds. In substrate we also have a DefaultNoBounds macro for convenience. That said, it seems you have found a better solution by changing the the order in which the macros are applied.

I looked through your prototype and I think this would a a very good design. Having a numeric counter at compile team is the best of both worlds:

Why u128 for the key? Since it is only a counter now we can get away with u32. This would map to a primitive operation in wasm. If you want to be very careful we could also use u64 which still has a native type in wasm. But I don't think it is necessary.
Why do you implement KeyTrait for ManualTypedKey and Storage<T, ManualTypedKey> with the same implementation? Couldn't the latter just use the former?
Naming there is slightly confusing. I won't mention anything here because it is just a prototype but we should definitely look into naming later on.
I think the docs which describe the user definition of Test do not match the generated code: In the docs there is no usage of Storage but in the generated code there is.
This can't detect duplicate keys, right? I played around a bit and was able to have duplicate keys.

This looks really promising to me. Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID) which is 36byte assuming that we use u32 for the counter. Maybe 64byte would be a sensible limit to allow for Mapping keys that are larger than an AccountId.

Feb 21 '22 13:02 athei

Why u128 for the key? Since it is only a counter now we can get away with u32. This would map to a primitive operation in wasm. If you want to be very careful we could also use u64 which still has a native type in wasm. But I don't think it is necessary.

I tested it with current ink's codegen, so it was simpler to transmute (u128, u128) into [u8; 32]. But yes, we can use u32 or u64 in the final solution. I think u32 should be enough to not have collisions.

Why do you implement KeyTrait for ManualTypedKey and Storage<T, ManualTypedKey> with the same implementation? Couldn't the latter just use the former?

It is caused by the Debug trait for logging=) It was a late night and I tried to finish the example faster. It is only for the println in the example.

I think the docs which describe the user definition of Test do not match the generated code: In the docs there is no usage of Storage but in the generated code there is.

Yep, you are right, I forgot to update the comment.

// The user defiend the next struct:
//
// #[derive(Default, Debug)]
// #[ink(storage)]
// struct Test {
//     value: bool,
//     value2: Storage<bool>,
//     value3: u32,
//     value4: Storage<u32>,
// }

This can't detect duplicate keys, right? I played around a bit and was able to have duplicate keys.

Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey mode, and only have Auto, ManualOffset. In that case, the code will not generate duplicate keys.

Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID) which is 36byte assuming that we use u32 for the counter. Maybe 64byte would be a sensible limit to allow for Mapping keys that are larger than an AccountId.

I think resolving that issue will close that question as we discussed in the element channel.

I also want to highlight another problem in that solution. Test is Test<()>, but Test<ManualKey<2>> is another structure. So if the user wants to work with the code that is generated for Test he should transform it from Test<ManualKey<2>> into Test<()>.

I added AtomicResolveType: If the structure is atomic(doesn't use Sotrage or Mapping) then we can use Test, else Test<SomeKey>. It helps to not generate useless code and work with Test where it is possible. But the problem still exists for structures that are complex(that contains Storage or Mapping).

We can generate the AsRef implementation, From. But it still can be unclear for the developer why he should convert Test into Test<SomeKey>.

impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsRef<Test<NewParent>> for Test<OldParent> {
    #[inline]
    fn as_ref(&self) -> &Test<NewParent> {
        unsafe { core::mem::transmute::<&Test<OldParent>, &Test<NewParent>>(self) }
    }
}
impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsMut<Test<NewParent>> for Test<OldParent> {
    #[inline]
    fn as_mut(&mut self) -> &mut Test<NewParent> {
        unsafe { core::mem::transmute::<&mut Test<OldParent>, &mut Test<NewParent>>(self) }
    }
}

Feb 21 '22 14:02 xgreenx

I think resolving https://github.com/paritytech/substrate/issues/7724 will close that question as we discussed in the element channel.

Yes. This will kill two birds with one stone.

Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey mode, and only have Auto, ManualOffset. In that case, the code will not generate duplicate keys.

If it makes things easier we should drop the manual key for now. Right now the keys are also derived from order. We can add this later. Maybe even as a post processing step on metadata to not further complicate the code (as you initially suggested).

I also want to highlight another problem in that solution. Test is Test<()>, but Test<ManualKey<2>> is another structure. So if the user wants to work with the code that is generated for Test he should transform it from Test<ManualKey<2>> into Test<()>.

Yeah this is kind of a bummer. Didn't understand the code well enough to help with that right now. But I am sure that using transmute for this isn't the right way.

Feb 21 '22 15:02 athei

@xgreenx We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?

Feb 25 '22 10:02 cmichi

We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?

We need to decide on several questions and I think we can move forward.

Do we want to hide the generic type of the NextStorageKey on the storage structure or not?

1.1. We can add a generic type during the codegen as described above. By default, it will be (). But if it is not an atomic struct, it will use the generic to have a correct storage key and the user should understand that he needs to work with the generic struct.

Pros:

Easy definition of the struct.
Everything works as expected if it is an atomic struct. Impls sections will work correctly and derive too.

Cons:

If it is not an atomic struct, impls sections and derive will not work.
The user will not understand why the compiler requires him to pass something into the struct and why he should convert from Struct<Key<1>> into Struct<()> or Struct<Key<2>>.
If the user wants to write some advanced code he should be familiar that we are adding a generic during codegen.
Highlighting will not work in IDE for a generic type.

1.2. The user should explicitly specify the generic type for NextStorageKey(In the case if the struct contains several generics the NextStorageKey should be last). He doesn't need to use it in the struct, the codegen will use it.

Pros:

We force the user to use generics from the start and he will use it everywhere and the code is clear for him if he understands what is going on.
The user will write the right code that uses generic and storage keys will be right. So impls sections and derive will work.
IDE highlighting works correctly.

Cons:

We force the user to use generics from the start and he will use it everywhere and the code is not clear for him if he doesn't understand what is going on.
Ugly definition of the struct.
We need to pass generic in every impl section and procedure-like functions.

The definition looks like:

#[derive(Default, Debug)]
#[ink(storage)]
struct Test<Last: NextStorageKey> {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

#[ink(storage)] is a procedural macro that will generate impls for helper traits. How do we want to identify the main contract's storage?

For example, we can use #[ink::storage] to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)] or #[ink::storage(contract)].

Or you can suggest your ideas=)

Maybe you have some suggestions about naming(for all stuff)=D For example, NextStorageKey is too long, maybe Storage is enough.

Feb 25 '22 12:02 xgreenx

1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?

For example, we can use #[ink::storage] to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)] or #[ink::storage(contract)].

I would suggest keeping #[ink::storage(<ROOT_KEY>)] for the main storage and using #[ink::storage_ext] for additional storage items. <ROOT_KEY> is optional and defaults to 0. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.

Maybe you have some suggestions about naming(for all stuff)=D For example, NextStorageKey is too long, maybe Storage is enough.

I agree. If this is visible to the user a simple name as Storage would go a long way in not confusing them. In general it is very worthwhile to spend a lot of time on naming. Don't glance over it. It is time well spent. That said, we can always change the names late in a review as these are simple find and replace changes.

Feb 25 '22 14:02 athei

1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?

I also prefer the explicit solution. Because the developer can google Rust information and find what is going on.

is there really no better way (I can't tell)

We want to know the key during the compilation time of each Storage or Mapping field -> Each Storage or Mapping field should have its own storage key.

Here we have two options:

The key is calculated relatively to previous keys to create a unique key each time and avoid duplicate keys.

Previously used cells should affect the storage key of the current struct -> The struct should know about the parent -> We need to use Generic.

Pros:

The user can use the atomic and not atomic structs many times.

Cons:

We add generics everywhere.
The codegen is harder than 2.
It is harder to understand than 2.

The key is generated based on the NAME_OF_STRUCT::NAME_OF_FIELD.

Pros:

The codegen is easy to implement.
We didn't add any additional stuff to identify the storage key(we don't need generic).
It is easy to calculate and understand.

Cons:

The user can use the not atomic struct only one time in the code. If he uses it two times then he will have duplicate keys.
The name of each struct should be unique.

But we can identify during metadata generation that someone used the struct more than one time and throw an error with description. And the user should use another struct.

I would suggest keeping #[ink::storage(<ROOT_KEY>)] for the main storage and using #[ink::storage_ext] for additional storage items. <ROOT_KEY> is optional and defaults to 0. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.

I'm okay with that solution=)

Feb 25 '22 14:02 xgreenx

Can you please remind me what you mean when you say "atomic struct"?

I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.

I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping and with Storage. Then we can check whether this is actually something that is usable or to complicated for the users.

Feb 25 '22 15:02 athei

"atomic struct" - all fields are atomic. All types are atomic except Storage and Mapping.

I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.

We still can use u32 or u64, the chance to generate the same key is not so high=)

I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping and with Storage. Then we can check whether this is actually something that is usable or to complicated for the users.

const ROOT_KEY: u32 = 123;

#[ink(storage(ROOT_KEY))]
pub struct Flipper {
    value1: AtomicStruct,
    value2: NonAtomicStruct,
    value3: Storage<AtomicStruct>,
    // Possible
    value4: Storage<NonAtomicStruct>,
    // Possible
    value5: Mapping<u128, AtomicStruct>,
    // Compilation Error, it is not possible
    // value6: Mapping<u128, NonAtomicStruct>,
}

impl Flipper {
    fn get_value1(&self) -> &AtomicStruct {
        &self.value1
    }
    
    fn get_value2<T: Storage>(&self) -> &NonAtomicStruct<T> {
        &self.value2
    }

    fn get_value3<T: Storage>(&self) -> &Storage<AtomicStruct, T> {
        &self.value3
    }
    
    fn get_value4<T1: Storage, T2: Storage>(&self) -> &Storage<NonAtomicStruct<T2>, T1> {
        &self.value4
    }

    fn get_value5<T: Storage>(&self) -> &Mapping<u128, AtomicStruct, T> {
        &self.value5
    }
}

#[ink::storage_ext]
pub struct AtomicStruct<Last: Storage = ()> {
    value1: bool,
    value2: u32,
    value3: Vec<u32>,
    value4: String,
}

#[ink::storage_ext]
pub struct NonAtomicStruct<Last: Storage = ()> {
    value1: Mapping<u32, bool>,
    value2: Storage<Vec<u32>>,
}

impl<Last: Storage> NonAtomicStruct<Last> {
    fn get_value1<T: Storage>(&self) -> &Mapping<u32, bool, T> {
        &self.value1
    }
    
    fn get_value2<T: Storage>(&self) -> &Storage<Vec<u32>, T> {
        &self.value2
    }
}

Feb 25 '22 15:02 xgreenx

We still can use u32 or u64, the chance to generate the same key is not so high=)

I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.

I think with a numeric solution we could even get away with u8. We would make the Key generic and just use u8 by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.

Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.

Just one question: Why is the type parameter on the storage structs called Last? It seems like a strange name.

Feb 25 '22 19:02 athei

I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.

Yes, we can do the same as for selectors, we will truncate it to u32 or u64.

I think with a numeric solution we could even get away with u8. We would make the Key generic and just use u8 by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.

In WASM it still is u32, I think it is better to keep u32.

Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.

Okay=) It looks normal to me. I only worry that someone from Solidity can be afraid of it=)

Just one question: Why is the type parameter on the storage structs called Last? It seems like a strange name.

Because actually, we will pass the last data type(that type took the previous cell) that will be used to get the next storage key. We can use Parent but it is not actually parent=) We can simply call it Key: Storage or S: Storage. I'm not master of the naming=(

Feb 25 '22 19:02 xgreenx

In WASM it still is u32, I think it is better to keep u32.

I think those constants will go into the wasm data section and can be packed densely there. Of course they will be loaded into an i32 when they are used but we still save the space in the data section. We would need to verify that, though.

We can use Parent but it is not actually parent=) We can simply call it Key: Storage or S: Storage. I'm not master of the word=(

Some renaming suggestions:

I would rename our concrete Storage type to StoredValue.
Our Storage trait deals mainly with storage keys if I understand it correctly. I suggest renaming it to: StorageKey.
The name of the type variable is then Key: StorageKey.

Do you have a preference regarding numeric keys vs truncated 64bit hashes? My thoughts on truncated hashes:

Con:

We need to store non compressible 64bit values for each field in our data section.
Given that we use hash(CONTRACT_ROOTKEY ++ STRUCT_NAME ++ FIELD_NAME): We cannot use the same non atomic struct twice. We would need an attribute to override the STRUCT_NAME it in that case.

Pro:

Simpler
User doesn't need to deal with generics?

Let's do some napkin math regarding the size benefits of numeric keys. Let's say numeric keys are 1byte in contract size (stored in data section which is linear memory) and truncated keys are 8byte. So we look at 7byte overhead for each field.

How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.

Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.

Feb 25 '22 22:02 athei

Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.

I also prefer the version with truncated hash, because It will be simpler for me as for a smart contract developer. I don't like to put generics everywhere in cases where I plan to use only atomic structures(or not atomic struct but one time).

How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.

Yea, the overhead seems not too high + it requires investigation regarding 1byte=)

Given that we use hash(CONTRACT_ROOTKEY ++ STRUCT_NAME ++ FIELD_NAME): We cannot use the same non atomic struct twice. We would need an attribute to override the STRUCT_NAME it in that case.

If someone wants to use some struct twice or more, he can specify the generic by himself.

/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>(PhantomData<(Key, XOR)>);

const ROOT_KEY: u64 = 123;

#[ink(storage(ROOT_KEY))]
pub struct Flipper {
    value: TestTwice<hash_u64!("Flipper::value")>,
    value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
    value3: TestOnce,
}

#[ink::storage_ext]
struct TestTwice<const Key: u64> {
    value: StoredValue<bool, ManualKey<hash_u64!("Test::value"), Key>>,
    value2: StoredValue<bool, ManualKey<hash_u64!("Test::value2"), Key>>,
    value3: StoredValue<u32, ManualKey<hash_u64!("Test::value3"), Key>>,
    value4: StoredValue<u32, ManualKey<hash_u64!("Test::value4"), Key>>,
}

#[ink::storage_ext]
struct TestOnce {
    value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"),  42>>,
    value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
    value3: StoredValue<u32>,
    value4: StoredValue<u32>,
}

Feb 25 '22 22:02 xgreenx

Having to manually key every field in TestTwice seems a bit extreme just so we can use it twice. Couldn't the supplied const Key be picked up by the automatic key generation? The macro should be able to detect that there is a const Key specified and use it.

Feb 25 '22 23:02 athei

We can add XOR part to AutoKey:

/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>;
/// The auto-generated storage key(based on the name of the stuct and field) will be XORed with `XOR` value
struct AutoKey<const XOR: u64 = 0>;

const ROOT_KEY: u64 = 123;

#[ink(storage(ROOT_KEY))]
pub struct Flipper {
    value: TestTwice<hash_u64!("Flipper::value")>,
    value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
    value3: TestOnce,
}

#[ink::storage_ext]
struct TestTwice<const Key: u64> {
    value: StoredValue<bool, AutoKey<Key>>,
    value2: StoredValue<bool, AutoKey<Key>>,
    value3: StoredValue<u32, AutoKey<Key>>,
    value4: StoredValue<u32, AutoKey<Key>>,
}

#[ink::storage_ext]
struct TestOnce {
    value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"),  42>>,
    value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
    value3: StoredValue<u32, AutoKey<42>>,
    value4: StoredValue<u32>,
}

Feb 25 '22 23:02 xgreenx

The macro should be able to detect that there is a const Key specified and use it.

We can try to automate that process. For that, we need some marker to understand that it is generic for the storage key. Maybe we can define an alias in ink_storage crate like pub type StorageKey = u64 and check that type in const generic is StorageKey(It will not work if the user will alias it again).

Then we can simplify:

#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
    value: StoredValue<bool, AutoKey<Key>>,
    value2: StoredValue<bool, AutoKey<Key>>,
    value3: StoredValue<u32, AutoKey<Key>>,
    value4: StoredValue<u32, AutoKey<Key>>,
}

Into:

#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
    value: StoredValue<bool>,
    value2: StoredValue<bool>,
    value3: StoredValue<u32>,
    value4: StoredValue<u32>,
}

Feb 25 '22 23:02 xgreenx

Yes this is what I had in mind. Sure we can't detect the re-alias but I don't think this is a case we need to consider.

This is all looks pretty solid to me. The latest example looks like something I can very much live with. We should have the rest of the ink! team look over it so we are all on the same page.

Feb 26 '22 00:02 athei

This discussion is pretty dense, but I'll try and take a look later this week

Mar 01 '22 01:03 HCastano

@xgreenx Regarding #[ink(storage)] and #[ink::storage_ext]: In an ideal world we would need only #[ink(storage)] and detect if the struct is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only one storage attribute?

Mar 04 '22 15:03 cmichi

@xgreenx Regarding #[ink(storage)] and #[ink::storage_ext]: In an ideal world we would need only #[ink(storage)] and detect if the struct is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only one storage attribute?

I don't think that it is possible. I think it is better to add some kind of marker for root contract like #[ink(storage, root))]. We can ask the developer always to specify the root_key for root storage #[ink(storage, root_key = 0))]=)

Mar 04 '22 16:03 xgreenx

and detect if the struct is meant to be the top-level storage item or a nested one

I only see two ways:

/// What xgreenx suggested
#[ink(storage)] + #[ink(storage, root, root_key = <OPTIONAL_ROOT_KEY>))]
/// Define the name of the root struct by name
/// I favor this one because it also prevents declaring two structs as root struct.
#[ink(storage)] + #[ink(contract, storage = struct_name, root_key = <OPTIONAL_ROOT_KEY>)]

How else would you detect the root struct? You could always just define the "first struct" as the root struct but I don't think this would be a better solution.

Mar 06 '22 13:03 athei

We included that issue in our roadmap, so we are waiting for approval from ink! team to start work on it=)

Mar 07 '22 13:03 xgreenx

Okay, so I finally got around to taking a read today. I can't say I follow everything, but I have some initial thoughts and questions:

What is this AtomicResolverHelper?
What is this supposed to indicate: const IS_VALUE_ATOMIC: bool;
- So atomic ~= primitive types?
- How's this different from a StorageValue<$primitive_type>?
Maybe generic in Test<Parent> should be Key or RootKey instead?
I like that CELL_USED is calculated at compile time
Appreciate that your code runs
Need to re-read discussion about why macro ordering matters
Agree, we can do the manual keys later, auto-keys seems like a good start
I don't mind forcing users to use generics, they're a pretty basic Rust construct
- Main advantage here is IDE support, which I think makes up for the "confusion"
#[ink::storage(<ROOT_KEY>)] and storage_ext seem fine to me
- Maybe rename storage_ext to storage_item?
Don't understand discussion here and here
- May just need another read

Direction looks really good in general though!

Mar 09 '22 04:03 HCastano

ink ink copied to clipboard

Macro based storage rework

Open Questions

ink
ink copied to clipboard