ink
ink copied to clipboard
Macro based storage rework
Most of the existing storage types were removed because they are simply to inefficient in size and executed instructions. Right now we only have Mapping
. However, this type doesn't work very well with the existing storage infrastructure and needs a band aid named SpreadAllocate
and initialize_contract
.
We want to pivot more in the direction of how storage works in FRAME. We should keep the overarching contract data structure but use the storage macro to generate the supporting code instead of a elaborate system of traits.
This would also allow us more flexibility in how to generate the storage keys for these data structures and more control over what code is generated. The aim is to reduce the contract sizes by removing monomorphization and copying of the root keys.
Please note that the following is only a sketch of the generated code. It is probably not how the generated code should look like. @Robbepop noted that we should not use private sub modules. Please feel free to post your suggestions and I will update the top post.
We add a new trait that will be implemented by the storage macro to supply our storage traits with a key. Our storage traits have default implementations for every function. The macro will only set the associated types.
trait HasKey {
const KEY: [u8; 32];
}
trait StorageValue: HasKey {
type Value: Encode + Decode;
// we take self here to build in the mutability levels
fn write(&mut self, value: &Value) {
ink_env::set_contract_storage(Self::KEY, value.encode());
}
fn read(&self) -> Self::Value {
ink_env::get_contract_storage(Self::KEY)
}
}
trait StorageMap: HasKey {
type Key: Hashable;
type Value: Encode + Decode;
fn insert(&mut self, key: Self::Key, &value: Self::Value) {
ink_env::set_contract_storage(twoxx(Self::KEY ++ key), value.encode());
}
fn get(&self, key: Self::Key) -> Self::Value {
ink_env::get_contract_storage(twoxx(Self::KEY ++ key));
}
}
Then the storage macro creates a private module where it dumps all the generated code:
#[ink(storage)]
struct Contract {
a: u8,
b: Mapping<AccountId, Balance>,
}
// The macro would transfrom the struct into this:
// The underscores are just to prevent clashes with user defined modules
// Having public fields in the struct would be an error.
mod __private__ {
pub struct AFieldStorageValue {
/// prevent instantation outside of this generated code
_private: ()
}
pub struct BFieldMapping {
/// prevent instantation outside of this generated code
_private: ()
}
impl HasKey for AFieldStorageValue {
// hash of "a: AFieldStorageValue"
const KEY: [u8; 32] = b"3A9ADFE4234B0475EB1767ACD1BCFA75D7B9E0BA1C427FBAF0F476D181D3A820";
}
impl HasKey for BFieldMapping {
// hash of "b: BFieldMapping"
const KEY: [u8; 32] = b"33D27C398EEEA3851B750A1152FF9B63B1D6D10BE18E2D58A745776D9F2FF241";
}
// the functions all have default operations. We just need to set the types.
impl StorageValue for AFieldStorageValue {
type Value = u32;
}
// the functions all have default operations. We just need to set the types.
impl StorageMap for BFieldMapping {
type Key = AccountId;
type Value = Balance;
}
// fields are private cannot be contstructed by user
pub struct Contract {
a: AFieldStorageValue,
b: BFieldMapping,
}
impl Contract {
// we generate this but leave out any ink! defined collection types
// initialization is and always be the crux of the matter. I would
// suggest forcing the user to initialize everything here but
// collections. So every user defined type would be wrapped into
// a `StorageValue` and added here. The rest are ink! collections
// which might or might not be skipped here.
pub fn initialize(a: u32) -> Self {
let ret = Self {
a: AFieldStorageValue { _private: () },
b: BFieldMapping { _private: () },
};
ret.a.write(a);
ret
}
// we create accessors for each field with the correct mutability
// this way we keep the natural way of marking messages as
// mutable and getters
pub fn a(&self) -> &AFieldStorageValue { return &self.a };
pub fn b(&self) -> &BFieldMapping { return &self.b };
pub fn a_mut(&mut self) -> &mut AFieldStorageValue { return &mut self.a };
pub fn b_mut(&mut self) -> &mut BFieldMapping { return &mut self.b };
}
}
use __private__::Contract;
impl Contract {
// Should we have the `constructor` macro write the `Contract` on `Drop`?
#[ink(constructor)]
pub fn new() -> Self {
// user cannot fuck up because it is the only constructor available
Self::initialize(42)
}
#[ink(message)]
pub fn do_writes(&mut self) {
self.a_mut().write(1);
self.b_mut().insert(ALICE, 187334343);
}
#[ink(message)]
pub fn do_reads(&self) {
let a = self.a().read();
let b = self.b().get(ALICE);
}
#[ink(message)]
pub fn do_reads_and_writes(&mut self) {
// just works
self.a_mut().write(1);
self.b_mut().insert(ALICE, 187334343);
let a = self.a().read();
let b = self.b().get(ALICE);
}
}
The biggest downside of this approach is the same as substrate's: New storage collections always need changes to be made in codegen in order to support them. However, I assume this will safe us a lot of code passing around all those keys and stateful objects (lazy data structures) at runtime. It is also much simpler and easier to reason about (because it has less features).
Open Questions
How does this affect/play with things like:
- Multi-file contracts?
- ink-as-dependency contracts
- ink! traits
I played with that idea, tested different implementations and cases, checked the sizes.
First of all to reduce the size, better to avoid frequent usage of ink_env::set_contract_storage
and ink_env::get_contract_storage
.
Instead of:
#[ink(storage)]
pub struct Flipper {
value: Storage<bool>,
value2: Storage<bool>,
value3: Storage<u32>,
value4: Storage<u32>,
}
Better to:
#[ink(storage)]
pub struct Flipper {
value: Storage<Test>,
}
struct Test {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
Or
#[ink(storage)]
pub struct Flipper {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
The second point - introduction of accessors like:
// we create accessors for each field with the correct mutability
// this way we keep the natural way of marking messages as
// mutable and getters
pub fn a(&self) -> &AFieldStorageValue { return &self.a };
pub fn b(&self) -> &BFieldMapping { return &self.b };
pub fn a_mut(&mut self) -> &mut AFieldStorageValue { return &mut self.a };
pub fn b_mut(&mut self) -> &mut BFieldMapping { return &mut self.b };
Will be very unclear for the user + we will have the problem with highlighting in IDE.
Based on the results I want to propose the next:
- Fields of the contract should be only
scale::Decode
andscale::Encode
(We are removing traits:SpreadLayout
,PackedLayout
,SpreadAllocate
,PackedAllocate
). - The whole contract is stored under the storage key
ContractRootKey::ROOT_KEY
in one cell -> Fields don't have their own storage key by default. The contract is one monolith structure. - All structures that plan to be part of the storage should be defined via
#[ink(storage)]
(I will describe later why). - ink! provides
Mapping<K, V, Mode = AutoStorageKey>
,Storage<T, Mode = AutoStorageKey>
and other types to allow the user to manage storage keys. The keys will be defined during compilation time(The main idea of that issue).
ink! will provide types and structures that allow moving some fields into their own cell. These types will have two modes:
-
ManualStorageKey<const KEY: u128>
- it allows to specify the storage key of the type. It isu128
because it is the maximum allowed const type. In the code it is converted into[u8; 32]
viacore::mem::transmute::<(u128, u128), [u8; 32]>((KEY, 0))
(Maybe you have a better idea? I tried the idea with created private types but the size of the contract was more. With const it showed better results). -
AutoStorageKey
- it is default mode. It means that#[ink(storage)]
macro will generate the key based on theNAME_OF_THE_STRUCTURE::NAME_OF_THE_FIELD
. We will replace the type ofStorage<T, AutoStorageKey>
with a new oneStorage<T, ManualStorageKey<hash!("NAME_OF_THE_STRUCTURE::NAME_OF_THE_FIELD")>>
. To recognize that the type supportsIsAutoStorage<const KEY: u128>
trait we will use autoref feature. If it isIsAutoStorage<const KEY: u128>
we useIsAutoStorage<const KEY: u128>::StorageType
else we use previous one.
That solution provides control over the storage keys. The developer can decide by himself that key use and where. The solution introduces only one limitation - all storage-related structures should be defined via the #[ink(storage)]
macro. The main storage of the contract can be marked with #[ink(contract)]
or #[ink(storage(main))]
or something like that.
That solution doesn't affect multi-files because everyone will use that rules to work with storage -> structures are marked with #[ink(storage)]
and implements scale::Decode + scale::Encode
. It also doesn't affect traits and ideas used in OpenBrush(We only need to rewrite the storage struct according to new rules). I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.
// `value` and `value2` are `Test2`. All inner storage keys are the same -> self.value.get().value4.set(&144) will also affect self.value2.get().value4
#[ink(storage)]
pub struct Flipper {
value: Storage<Test2, ManualKey<1>>,
value2: Storage<Test2, ManualKey<4>>,
}
#[ink(storage)]
struct Test2 {
value: Storage<bool>,
value2: Storage<bool>,
value3: Storage<u32>,
value4: Storage<u32>,
}
That problem has a workaround with manual offset:
#[ink(storage)]
pub struct Flipper {
value: Storage<Test2<100>, ManualKey<100>>,
value2: Storage<Test2<200>, ManualKey<200>>,
}
#[ink(storage)]
struct Test2<const Offset: u128> {
value: Storage<bool, ManualKey<{ Offset + 1 }>>,
value2: Storage<bool, ManualKey<{ Offset + 2 }>>,
value3: Storage<u32, ManualKey<{ Offset + 3 }>>,
value4: Storage<u32, ManualKey<{ Offset + 4 }>>,
}
But then we need to return an error if the storage layout contains two same keys to prevent the user from bugs=)
Below the example of the code that I used for testing(It is only MVP). The same idea can be implemented in Mapping
.
#[derive(Default)]
struct AutoKey;
#[derive(Default)]
struct ManualKey<const KEY: u128>;
#[derive(Default)]
struct Storage<T: Decode + Encode, Mode = AutoKey>(PhantomData<(T, Mode)>);
#[cfg(feature = "std")]
impl<T: Decode + Encode + scale_info::TypeInfo + 'static, Mode> ::ink_storage::traits::StorageLayout for Storage<T, Mode>
where Self: KeyTrait {
fn layout(_key_ptr: &mut KeyPtr) -> Layout {
Layout::Cell(CellLayout::new::<T>(LayoutKey::from(
&<Self as KeyTrait>::key(),
)))
}
}
trait KeyTrait {
fn key() -> Key;
}
trait StorageTrait: KeyTrait {
type Value: Decode + Encode;
fn get(&self) -> Self::Value {
ink_env::get_contract_storage(&Self::key()).unwrap().unwrap()
}
fn set(&self, value: &Self::Value) {
ink_env::set_contract_storage(&Self::key(), &value);
}
}
impl<T: Decode + Encode, const KEY: u128> KeyTrait for Storage<T, ManualKey<KEY>> {
#[inline]
fn key() -> Key {
Key::from(unsafe { core::mem::transmute::<(u128, u128), [u8; 32]>((KEY, 0)) })
}
}
impl<T: Decode + Encode, const KEY: u128> StorageTrait for Storage<T, ManualKey<KEY>> {
type Value = T;
}
impl<T: Decode + Encode> KeyTrait for Storage<T, AutoKey> {
#[inline]
fn key() -> Key {
// Here we will put `__ink_error...`
panic!()
}
}
impl<T: Decode + Encode> StorageTrait for Storage<T, AutoKey> {
type Value = T;
}
trait IsAutoKey<const KEY: u128> {
type StorageType;
}
impl<T: Decode + Encode, const KEY: u128> IsAutoKey<KEY> for Storage<T, AutoKey> {
type StorageType = Storage<T, ManualKey<KEY>>;
}
The whole contract is stored under the storage key ContractRootKey::ROOT_KEY in one cell -> Fields don't have their own storage key by default. The contract is one monolith structure.
Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.
In your examples I don't see the overarching main contract structure that ties all of that together.
I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.
This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.
Also, would be nice to have the example with Mapping
instead of Storage
. The latter is too simple to understand what is happening there.
Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.
I didn't implement ContractRootKey::ROOT_KEY
logic in the example because it is part of #[ink::contract]
macro. All fields implement Decode + Encode
. To load a contract we will use ink_env::get_contract_storage(&ContractRootKey::ROOT_KEY)
, to flush - ink_env::set_contract_storage(&ContractRootKey::ROOT_KEY, &contract.encode())
.
The idea is that the developer for some type will create an empty implementation of Decode
and Encode
. For example Storage
or Mapping
. That types work with storage directly.
So all atomic types are loaded with contract. Other types doesn't require loading.
In your examples I don't see the overarching main contract structure that ties all of that together.
I will prepare more detailed example=)
This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.
Seems I find the solution for that problem. const_generics_defaults
feature is stable now, and we can create strcutures like:
struct Test2<const KEY: u128 = 0> {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
So our #[ink(storage)]
macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.
Also, would be nice to have the example with Mapping instead of Storage. The latter is too simple to understand what is happening there.
Will try to add an example for Mapping
too. The problem that example already is big=)
Seems I find the solution for that problem. https://github.com/rust-lang/rust/issues/44580#issuecomment-991782799 feature is stable now, and we can create strcutures like:
struct Test2<const KEY: u128 = 0> { value: bool, value2: bool, value3: u32, value4: u32, }
So our #[ink(storage)] macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.
No matter, we can't use that KEY
to do calculations like { KEY + 2 }
during the declaration. It is possible with an unstable feature #![feature(generic_const_exprs)]
, but we want to be stable=)
I will prepare the simple version only with a manual key and with generated.
This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.
I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.
I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.
If we can catch this during compilation then it is fine. Would be better to catch it during the compilation of the wasm, though.
The idea is that the developer for some type will create an empty implementation of Decode and Encode. For example Storage or Mapping. That types work with storage directly.
So all atomic types are loaded with contract. Other types doesn't require loading.
This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap
simply implement an empty Encode + Decode
. Just to reiterate: What we want to achieve with this rewrite is:
- Simplify this whole thing and drop features we don't need (like removing all the traits you mentioned).
- These are mainly used to determine which values are packed together into a single cell and where designed to only work with lazy data structures (that write on
Drop
)
- These are mainly used to determine which values are packed together into a single cell and where designed to only work with lazy data structures (that write on
- Getting rid of
initialize_contract
: This is done by generating keys through macros and not by passing them down via parameters - Reducing contract size
I have some further questions:
- Would it be legal to nest those storage types? For example, to allow a
Storage
with the value of aMapping
? I don't think this makes sense, right? - In all your examples you just a numeric key (
u128
). Would the same type also be used for the automatic key? I think this would be beneficial as it is much smaller as concatenating strings. Could we just use the position within the struct? Like having a counter inside the proc macro? This is how it is currently working (this counter is passed down as argument). We could even use smaller type there then (likeu32
). - How would you combine the root key of a mapping (a numeric value) with the
Key
of theMapping
itself? Do we requireDisplay
for theKey
and just concat (remember we don't want to hash so that the key is transparent).
I managed to implement auto-incrementing of the storage key during compilation(it is a hard variant where we don't need to worry about keys)=) Here is an example. It contains only an example with Storage
at the moment, but the Mapping
is the same in the implementation. There is ugly code, but it will be generated by the macro. In the comments, I describe how it should look like in the contract. I added Debug
everywhere to output the keys, so please ignore that stuff. That solution has a problem with the derive
macro because I'm adding generic Parent: NextStorageKey
into the definition of the struct, and derive
macro generates bounds for that generic, but it is only in PhantomData
=( Maybe you know some workaround for that?
This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap simply implement an empty Encode + Decode. Just to reiterate: What we want to achieve with this rewrite is
Nope!=) Currently, we are storing each field under the own storage key. If we put all fields under ink_storage::Pack<T>
then we will store everything under one key, but currently, it is not. I compiled contracts with ink_storage::Pack<T>
and they take much less memory, so It is why I'm proposing to store all fields under the one key by default because it is the most performant strategy in terms of the contract's size.
If we generate storage keys during compilation, then the empty implementation of Encode + Decode
will not affect the size of the contract.
Would it be legal to nest those storage types? For example, to allow a Storage with the value of a Mapping? I don't think this makes sense, right?
Usage like Storage<Mapping<Key, Value>>
doesn't make sense. But Storage<Struct>
where Struct
contains many fields and some of that fields are Mapping
or Storage
is a valid case.
In all your examples you just a numeric key (u128). Would the same type also be used for the automatic key? I think this would be beneficial as it is much smaller as concatenating strings. Could we just use the position within the struct? Like having a counter inside the proc macro? This is how it is currently working (this counter is passed down as argument). We could even use smaller type there then (like u32).
Yeah, I want to propose using some integer as a storage key, because in terms of the contract we don't need to worry about the collision of fields. Yes, in the hard version that I posted it is auto-incremented.
How would you combine the root key of a mapping (a numeric value) with the Key of the Mapping itself? Do we require Display for the Key and just concat (remember we don't want to hash so that the key is transparent).
Storage key implements Encode
and key of Mapping
implements it. As we discussed with you in the chat, it can be done with a new seal_get_storage
it can be done like:
let key = (&Mapping::StorageKey, &Key).encode().as_ref();
let value = value.encode().as_ref();
seal_get_storage(key, key.len(), value, value.len())
seal_get_storage
will hash it by self
That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?
I mentioned that issue: https://github.com/rust-lang/rust/issues/26925
But seems we can handle it. The derive macro can be expanded before the #[ink(storage)]
macro, so we can add the generic for numeration later and it will not have conflicts with#[derive(...)]
. It works because of that change: https://github.com/rust-lang/rust/pull/79078
Our macro can check: Does exist at least one derive macro after him? if yes - our macro will put himself after the derive macro.
That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?
You are meaning the one on your Test
and Flipper
structs? Bounds are merely a heuristic and often wrong. It can detect PhantomData
but in your case the PhantomData
is burried deep in other types so the macro cannot see it. This is why we have attributes in parity-scale-codec
for its macros to overwrite bounds. In substrate we also have a DefaultNoBounds
macro for convenience. That said, it seems you have found a better solution by changing the the order in which the macros are applied.
I looked through your prototype and I think this would a a very good design. Having a numeric counter at compile team is the best of both worlds:
- Why
u128
for the key? Since it is only a counter now we can get away withu32
. This would map to a primitive operation in wasm. If you want to be very careful we could also useu64
which still has a native type in wasm. But I don't think it is necessary. - Why do you implement
KeyTrait
forManualTypedKey
andStorage<T, ManualTypedKey>
with the same implementation? Couldn't the latter just use the former? - Naming there is slightly confusing. I won't mention anything here because it is just a prototype but we should definitely look into naming later on.
- I think the docs which describe the user definition of
Test
do not match the generated code: In the docs there is no usage ofStorage
but in the generated code there is. - This can't detect duplicate keys, right? I played around a bit and was able to have duplicate keys.
This looks really promising to me. Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID)
which is 36byte assuming that we use u32
for the counter. Maybe 64byte would be a sensible limit to allow for Mapping
keys that are larger than an AccountId
.
Why u128 for the key? Since it is only a counter now we can get away with u32. This would map to a primitive operation in wasm. If you want to be very careful we could also use u64 which still has a native type in wasm. But I don't think it is necessary.
I tested it with current ink's codegen, so it was simpler to transmute (u128, u128)
into [u8; 32]
. But yes, we can use u32
or u64
in the final solution. I think u32
should be enough to not have collisions.
Why do you implement KeyTrait for ManualTypedKey and Storage<T, ManualTypedKey> with the same implementation? Couldn't the latter just use the former?
It is caused by the Debug
trait for logging=) It was a late night and I tried to finish the example faster. It is only for the println
in the example.
I think the docs which describe the user definition of Test do not match the generated code: In the docs there is no usage of Storage but in the generated code there is.
Yep, you are right, I forgot to update the comment.
// The user defiend the next struct:
//
// #[derive(Default, Debug)]
// #[ink(storage)]
// struct Test {
// value: bool,
// value2: Storage<bool>,
// value3: u32,
// value4: Storage<u32>,
// }
This can't detect duplicate keys, right? I played around a bit and was able to have duplicate keys.
Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey
mode, and only have Auto
, ManualOffset
. In that case, the code will not generate duplicate keys.
Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID) which is 36byte assuming that we use u32 for the counter. Maybe 64byte would be a sensible limit to allow for Mapping keys that are larger than an AccountId.
I think resolving that issue will close that question as we discussed in the element channel.
I also want to highlight another problem in that solution. Test
is Test<()>
, but Test<ManualKey<2>>
is another structure. So if the user wants to work with the code that is generated for Test
he should transform it from Test<ManualKey<2>>
into Test<()>
.
I added AtomicResolveType
: If the structure is atomic(doesn't use Sotrage
or Mapping
) then we can use Test
, else Test<SomeKey>
. It helps to not generate useless code and work with Test
where it is possible. But the problem still exists for structures that are complex(that contains Storage
or Mapping
).
We can generate the AsRef
implementation, From
. But it still can be unclear for the developer why he should convert Test
into Test<SomeKey>
.
impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsRef<Test<NewParent>> for Test<OldParent> {
#[inline]
fn as_ref(&self) -> &Test<NewParent> {
unsafe { core::mem::transmute::<&Test<OldParent>, &Test<NewParent>>(self) }
}
}
impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsMut<Test<NewParent>> for Test<OldParent> {
#[inline]
fn as_mut(&mut self) -> &mut Test<NewParent> {
unsafe { core::mem::transmute::<&mut Test<OldParent>, &mut Test<NewParent>>(self) }
}
}
I think resolving https://github.com/paritytech/substrate/issues/7724 will close that question as we discussed in the element channel.
Yes. This will kill two birds with one stone.
Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey mode, and only have Auto, ManualOffset. In that case, the code will not generate duplicate keys.
If it makes things easier we should drop the manual key for now. Right now the keys are also derived from order. We can add this later. Maybe even as a post processing step on metadata to not further complicate the code (as you initially suggested).
I also want to highlight another problem in that solution. Test is Test<()>, but Test<ManualKey<2>> is another structure. So if the user wants to work with the code that is generated for Test he should transform it from Test<ManualKey<2>> into Test<()>.
Yeah this is kind of a bummer. Didn't understand the code well enough to help with that right now. But I am sure that using transmute for this isn't the right way.
@xgreenx We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?
We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?
We need to decide on several questions and I think we can move forward.
- Do we want to hide the generic type of the
NextStorageKey
on the storage structure or not?
1.1. We can add a generic type during the codegen as described above. By default, it will be ()
. But if it is not an atomic struct, it will use the generic to have a correct storage key and the user should understand that he needs to work with the generic struct.
Pros:
- Easy definition of the struct.
- Everything works as expected if it is an atomic struct. Impls sections will work correctly and derive too.
Cons:
- If it is not an atomic struct, impls sections and derive will not work.
- The user will not understand why the compiler requires him to pass something into the struct and why he should convert from
Struct<Key<1>>
intoStruct<()>
orStruct<Key<2>>
. - If the user wants to write some advanced code he should be familiar that we are adding a generic during codegen.
- Highlighting will not work in IDE for a generic type.
1.2. The user should explicitly specify the generic type for NextStorageKey
(In the case if the struct contains several generics the NextStorageKey
should be last). He doesn't need to use it in the struct, the codegen will use it.
Pros:
- We force the user to use generics from the start and he will use it everywhere and the code is clear for him if he understands what is going on.
- The user will write the right code that uses generic and storage keys will be right. So impls sections and derive will work.
- IDE highlighting works correctly.
Cons:
- We force the user to use generics from the start and he will use it everywhere and the code is not clear for him if he doesn't understand what is going on.
- Ugly definition of the struct.
- We need to pass generic in every impl section and procedure-like functions.
The definition looks like:
#[derive(Default, Debug)]
#[ink(storage)]
struct Test<Last: NextStorageKey> {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
-
#[ink(storage)]
is a procedural macro that will generate impls for helper traits. How do we want to identify the main contract's storage?
For example, we can use #[ink::storage]
to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)]
or #[ink::storage(contract)]
.
Or you can suggest your ideas=)
- Maybe you have some suggestions about naming(for all stuff)=D For example,
NextStorageKey
is too long, maybeStorage
is enough.
1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?
For example, we can use #[ink::storage] to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)] or #[ink::storage(contract)].
I would suggest keeping #[ink::storage(<ROOT_KEY>)]
for the main storage and using #[ink::storage_ext]
for additional storage items. <ROOT_KEY>
is optional and defaults to 0
. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.
Maybe you have some suggestions about naming(for all stuff)=D For example, NextStorageKey is too long, maybe Storage is enough.
I agree. If this is visible to the user a simple name as Storage
would go a long way in not confusing them. In general it is very worthwhile to spend a lot of time on naming. Don't glance over it. It is time well spent. That said, we can always change the names late in a review as these are simple find and replace changes.
1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?
I also prefer the explicit solution. Because the developer can google Rust information and find what is going on.
is there really no better way (I can't tell)
We want to know the key during the compilation time of each Storage
or Mapping
field -> Each Storage
or Mapping
field should have its own storage key.
Here we have two options:
- The key is calculated relatively to previous keys to create a unique key each time and avoid duplicate keys.
Previously used cells should affect the storage key of the current struct -> The struct should know about the parent -> We need to use Generic.
Pros:
- The user can use the atomic and not atomic structs many times.
Cons:
- We add generics everywhere.
- The codegen is harder than 2.
- It is harder to understand than 2.
- The key is generated based on the
NAME_OF_STRUCT::NAME_OF_FIELD
.
Pros:
- The codegen is easy to implement.
- We didn't add any additional stuff to identify the storage key(we don't need generic).
- It is easy to calculate and understand.
Cons:
- The user can use the not atomic struct only one time in the code. If he uses it two times then he will have duplicate keys.
- The name of each struct should be unique.
But we can identify during metadata generation that someone used the struct more than one time and throw an error with description. And the user should use another struct.
I would suggest keeping #[ink::storage(<ROOT_KEY>)] for the main storage and using #[ink::storage_ext] for additional storage items. <ROOT_KEY> is optional and defaults to 0. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.
I'm okay with that solution=)
Can you please remind me what you mean when you say "atomic struct"?
I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.
I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping
and with Storage
. Then we can check whether this is actually something that is usable or to complicated for the users.
"atomic struct" - all fields are atomic. All types are atomic except Storage
and Mapping
.
I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.
We still can use u32
or u64
, the chance to generate the same key is not so high=)
I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping and with Storage. Then we can check whether this is actually something that is usable or to complicated for the users.
const ROOT_KEY: u32 = 123;
#[ink(storage(ROOT_KEY))]
pub struct Flipper {
value1: AtomicStruct,
value2: NonAtomicStruct,
value3: Storage<AtomicStruct>,
// Possible
value4: Storage<NonAtomicStruct>,
// Possible
value5: Mapping<u128, AtomicStruct>,
// Compilation Error, it is not possible
// value6: Mapping<u128, NonAtomicStruct>,
}
impl Flipper {
fn get_value1(&self) -> &AtomicStruct {
&self.value1
}
fn get_value2<T: Storage>(&self) -> &NonAtomicStruct<T> {
&self.value2
}
fn get_value3<T: Storage>(&self) -> &Storage<AtomicStruct, T> {
&self.value3
}
fn get_value4<T1: Storage, T2: Storage>(&self) -> &Storage<NonAtomicStruct<T2>, T1> {
&self.value4
}
fn get_value5<T: Storage>(&self) -> &Mapping<u128, AtomicStruct, T> {
&self.value5
}
}
#[ink::storage_ext]
pub struct AtomicStruct<Last: Storage = ()> {
value1: bool,
value2: u32,
value3: Vec<u32>,
value4: String,
}
#[ink::storage_ext]
pub struct NonAtomicStruct<Last: Storage = ()> {
value1: Mapping<u32, bool>,
value2: Storage<Vec<u32>>,
}
impl<Last: Storage> NonAtomicStruct<Last> {
fn get_value1<T: Storage>(&self) -> &Mapping<u32, bool, T> {
&self.value1
}
fn get_value2<T: Storage>(&self) -> &Storage<Vec<u32>, T> {
&self.value2
}
}
We still can use u32 or u64, the chance to generate the same key is not so high=)
I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.
I think with a numeric solution we could even get away with u8
. We would make the Key
generic and just use u8
by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.
Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.
Just one question: Why is the type parameter on the storage structs called Last
? It seems like a strange name.
I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.
Yes, we can do the same as for selectors, we will truncate it to u32
or u64
.
I think with a numeric solution we could even get away with u8. We would make the Key generic and just use u8 by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.
In WASM it still is u32
, I think it is better to keep u32
.
Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.
Okay=) It looks normal to me. I only worry that someone from Solidity can be afraid of it=)
Just one question: Why is the type parameter on the storage structs called Last? It seems like a strange name.
Because actually, we will pass the last data type(that type took the previous cell) that will be used to get the next storage key. We can use Parent
but it is not actually parent=) We can simply call it Key: Storage
or S: Storage
. I'm not master of the naming=(
In WASM it still is u32, I think it is better to keep u32.
I think those constants will go into the wasm data section and can be packed densely there. Of course they will be loaded into an i32
when they are used but we still save the space in the data section. We would need to verify that, though.
We can use Parent but it is not actually parent=) We can simply call it Key: Storage or S: Storage. I'm not master of the word=(
Some renaming suggestions:
- I would rename our concrete
Storage
type toStoredValue
. - Our
Storage
trait deals mainly with storage keys if I understand it correctly. I suggest renaming it to:StorageKey
. - The name of the type variable is then
Key: StorageKey
.
Do you have a preference regarding numeric keys vs truncated 64bit hashes? My thoughts on truncated hashes:
Con:
- We need to store non compressible 64bit values for each field in our data section.
- Given that we use
hash(CONTRACT_ROOTKEY ++ STRUCT_NAME ++ FIELD_NAME)
: We cannot use the same non atomic struct twice. We would need an attribute to override theSTRUCT_NAME
it in that case.
Pro:
- Simpler
- User doesn't need to deal with generics?
Let's do some napkin math regarding the size benefits of numeric keys. Let's say numeric keys are 1byte in contract size (stored in data section which is linear memory) and truncated keys are 8byte. So we look at 7byte overhead for each field.
How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue
already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.
Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.
Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.
I also prefer the version with truncated hash, because It will be simpler for me as for a smart contract developer. I don't like to put generics everywhere in cases where I plan to use only atomic structures(or not atomic struct but one time).
How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.
Yea, the overhead seems not too high + it requires investigation regarding 1byte=)
Given that we use hash(CONTRACT_ROOTKEY ++ STRUCT_NAME ++ FIELD_NAME): We cannot use the same non atomic struct twice. We would need an attribute to override the STRUCT_NAME it in that case.
If someone wants to use some struct twice or more, he can specify the generic by himself.
/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>(PhantomData<(Key, XOR)>);
const ROOT_KEY: u64 = 123;
#[ink(storage(ROOT_KEY))]
pub struct Flipper {
value: TestTwice<hash_u64!("Flipper::value")>,
value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
value3: TestOnce,
}
#[ink::storage_ext]
struct TestTwice<const Key: u64> {
value: StoredValue<bool, ManualKey<hash_u64!("Test::value"), Key>>,
value2: StoredValue<bool, ManualKey<hash_u64!("Test::value2"), Key>>,
value3: StoredValue<u32, ManualKey<hash_u64!("Test::value3"), Key>>,
value4: StoredValue<u32, ManualKey<hash_u64!("Test::value4"), Key>>,
}
#[ink::storage_ext]
struct TestOnce {
value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"), 42>>,
value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
value3: StoredValue<u32>,
value4: StoredValue<u32>,
}
Having to manually key every field in TestTwice
seems a bit extreme just so we can use it twice. Couldn't the supplied const Key
be picked up by the automatic key generation? The macro should be able to detect that there is a const Key
specified and use it.
We can add XOR
part to AutoKey
:
/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>;
/// The auto-generated storage key(based on the name of the stuct and field) will be XORed with `XOR` value
struct AutoKey<const XOR: u64 = 0>;
const ROOT_KEY: u64 = 123;
#[ink(storage(ROOT_KEY))]
pub struct Flipper {
value: TestTwice<hash_u64!("Flipper::value")>,
value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
value3: TestOnce,
}
#[ink::storage_ext]
struct TestTwice<const Key: u64> {
value: StoredValue<bool, AutoKey<Key>>,
value2: StoredValue<bool, AutoKey<Key>>,
value3: StoredValue<u32, AutoKey<Key>>,
value4: StoredValue<u32, AutoKey<Key>>,
}
#[ink::storage_ext]
struct TestOnce {
value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"), 42>>,
value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
value3: StoredValue<u32, AutoKey<42>>,
value4: StoredValue<u32>,
}
The macro should be able to detect that there is a const Key specified and use it.
We can try to automate that process. For that, we need some marker to understand that it is generic for the storage key. Maybe we can define an alias in ink_storage
crate like pub type StorageKey = u64
and check that type in const generic is StorageKey
(It will not work if the user will alias it again).
Then we can simplify:
#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
value: StoredValue<bool, AutoKey<Key>>,
value2: StoredValue<bool, AutoKey<Key>>,
value3: StoredValue<u32, AutoKey<Key>>,
value4: StoredValue<u32, AutoKey<Key>>,
}
Into:
#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
value: StoredValue<bool>,
value2: StoredValue<bool>,
value3: StoredValue<u32>,
value4: StoredValue<u32>,
}
Yes this is what I had in mind. Sure we can't detect the re-alias but I don't think this is a case we need to consider.
This is all looks pretty solid to me. The latest example looks like something I can very much live with. We should have the rest of the ink! team look over it so we are all on the same page.
This discussion is pretty dense, but I'll try and take a look later this week
@xgreenx Regarding #[ink(storage)]
and #[ink::storage_ext]
: In an ideal world we would need only #[ink(storage)]
and detect if the struct
is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only one storage
attribute?
@xgreenx Regarding
#[ink(storage)]
and#[ink::storage_ext]
: In an ideal world we would need only#[ink(storage)]
and detect if thestruct
is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only onestorage
attribute?
I don't think that it is possible. I think it is better to add some kind of marker for root contract like #[ink(storage, root))]
. We can ask the developer always to specify the root_key
for root storage #[ink(storage, root_key = 0))]
=)
and detect if the struct is meant to be the top-level storage item or a nested one
I only see two ways:
/// What xgreenx suggested
#[ink(storage)] + #[ink(storage, root, root_key = <OPTIONAL_ROOT_KEY>))]
/// Define the name of the root struct by name
/// I favor this one because it also prevents declaring two structs as root struct.
#[ink(storage)] + #[ink(contract, storage = struct_name, root_key = <OPTIONAL_ROOT_KEY>)]
How else would you detect the root struct? You could always just define the "first struct" as the root struct but I don't think this would be a better solution.
We included that issue in our roadmap, so we are waiting for approval from ink! team to start work on it=)
Okay, so I finally got around to taking a read today. I can't say I follow everything, but I have some initial thoughts and questions:
- What is this
AtomicResolverHelper
? - What is this supposed to indicate:
const IS_VALUE_ATOMIC: bool;
- So atomic ~= primitive types?
- How's this different from a
StorageValue<$primitive_type>
?
- Maybe generic in
Test<Parent>
should beKey
orRootKey
instead? - I like that
CELL_USED
is calculated at compile time - Appreciate that your code runs
- Need to re-read discussion about why macro ordering matters
- Agree, we can do the manual keys later, auto-keys seems like a good start
- I don't mind forcing users to use generics, they're a pretty basic Rust construct
- Main advantage here is IDE support, which I think makes up for the "confusion"
-
#[ink::storage(<ROOT_KEY>)]
andstorage_ext
seem fine to me- Maybe rename
storage_ext
tostorage_item
?
- Maybe rename
- Don't understand discussion here and here
- May just need another read
Direction looks really good in general though!