aptos-core [Framework] Create storage account

@davidiw @junkil-park @lightmark @movekevin @msmouse @vgao1996

This PR adds storage account functionality, analogous to malloc in C.

Presently, global storage allocation can be done in one of several ways, which all have limitations:

Using a table requires expensive handle creation and hashing operations during lookup, and looking up data in the REST API is difficult with nested table handles.
Using a resource account requires the allocation of an aptos_framework::account::Account, which does not have a global resource group and thus leads to two global storage item creations during allocation.
Using an object requires creating an aptos_framework::object::ObjectCore, which can be incorporated in a resource group but which occupies almost 100 bytes. The ObjectCore can be deleted, but it requires several operations:
- object::create_object()
- object::generate_signer()
- move_to<ApplicationDataResource> using the object account (AUID) signer, under the ObjectGroup resource group (to avoid having two global storage item creations)
- object::generate_delete_ref()
- object::delete()

Like during object creation, the proposed storage account functionality relies on an AUID, but is much more straightforward than the above process required for object-based storage allocation.

Sep 17 '23 17:09 alnoki

For example, I think ultimately we want some sort of global reference to a storage unit that isn't just address but implies what is being stored there. This helps with building safer (entry) functions. I could imagine a more general component, like Storage<T> to replace Object<T>. Where there's guaranteed to be Storage<storage::Core>. We could make storage::Core special such that it requires a minimal state in the storage model -- I guess 1 byte?

Should we move toward a model in resource groups, where we store the hash of a resource type instead of the entire path? Or some other model for better efficiency? I think we should also not worry too much about the 1KB. I think @msmouse and team are already comfortable enough to bump it up a bit.

Sep 17 '23 18:09 davidiw

@davidiw thanks for the above

I'd like to take this more to an AIP as a starting point rather than rush in and create new pains.

Maybe we can jot down some shared notes and create something exactly or similar to this.

Let me push out some initial thoughts into an issue.

Standing by for another place to capture discussion, in the meantime I'll consolidate notes here.

I think one thing that might be nice is making it such that there is at least something here that indicates that this is an allocated storage account, e.g., an empty resource group struct.

Note that presently there is no way to prevent the publication of bare data:

[package]
name = "BareDataAllocator"
version = "0.1.0"
upgrade_policy = "immutable"
authors = ["alnoki ([email protected])"]

[addresses]
publisher = "_"

[dependencies.AptosFramework]
git = "https://github.com/aptos-labs/aptos-core.git"
rev = "testnet"
subdir = "aptos-move/framework/aptos-framework"

module publisher::bare_data {

    use aptos_framework::object;
    use aptos_framework::event;
    use std::signer;

    const DATA: vector<u8> = b"Hello David";

    struct BareData has key {
        data: vector<u8>
    }

    #[event]
    struct BareDataEvent has drop, store {
        seeder: address,
        bare_data_address: address,
    }

    public entry fun allocate_bare_data(seeder: &signer) {
        // Hack object API to get a bare signer.
        let constructor = object::create_object(signer::address_of(seeder));
        let bare_account = object::generate_signer(&constructor);
        let deletor = object::generate_delete_ref(&constructor);
        object::delete(deletor);
        // Move bare data to bare signer account.
        move_to(&bare_account, BareData { data: DATA });
        // Emit event about bare data.
        let seeder = signer::address_of(seeder);
        let bare_data_address = signer::address_of(&bare_account);
        event::emit(BareDataEvent { seeder, bare_data_address });
    }
}

BareDataAllocator package published on testnet: https://aptos-explorer.netlify.app/account/0xaf7d1e1f4069c1ba40450f05e9b48c2e0218f373d4538399c405402a1a7d16f3/modules/code/bare_data?network=testnet
Bare data allocation txn (no ObjectCore in writeset): https://aptos-explorer.netlify.app/txn/680124164?network=testnet
(The explorer and the CLI aptos account list both fail to render the bare data account, presumably because there is no aptos_framework::account::Account there, but the writeset on the allocation txn shows the new BareData at the bare data address)

Compared with what I'm submitting in this PR it's a little more expensive to get a bare storage account by hacking the object model per the above snippet, but the object model hack nevertheless enables malloc into global storage without the overhead of an ObjectCore or any of the other issues I describe in the PR description. Hence I submitted this PR to simply reduce the number of function calls required for the end result of bare data allocation.

For example, I think ultimately we want some sort of global reference to a storage unit that isn't just address but implies what is being stored there. This helps with building safer (entry) functions. I could imagine a more general component, like Storage<T> to replace Object<T>. Where there's guaranteed to be Storage<storage::Core>.

I understand the appeal here in terms of enforcing safe practices for existence checks etc. on higher-level application data structures, but with regard to what I might call "raw memory" (like a tree node in the unit test I wrote for this PR), I consider anything extra to be overhead. Since it is already possible to get a bare account (essentially a pointer into malloc memory) using the object model hack, I would personally favor this approach for things like allocating a new leaf node in a B+ tree.

And so I think we've stumbled on the classical debate of pointers versus borrow checking: I've essentially figured out how to do pointers in the existing memory model and I'm just trying to make it even more efficient. Whereas it sounds like you're suggesting the implementation of a more borrow-checker like approach to raw data, which I think is also a good feature to have. So maybe we need two AIPs?

Make pointer-like data operations more efficient (what I'm proposing in this PR) per The Spirit of C (trust the programmer, keep the language small and simple, make it fast)
Implement what is effectively a borrow checker on raw memory

We could make storage::Core special such that it requires a minimal state in the storage model -- I guess 1 byte?

Should we move toward a model in resource groups, where we store the hash of a resource type instead of the entire path? Or some other model for better efficiency?

Thanks for bringing this up, as I overlooked the storage overhead for the resource type because I can't seem to find any documentation on this topic. As I understand it from your explanation, in the current model the storage overhead for Hey will be one more byte than for Yo (on top of the BCS representation) assuming the structs are defined in the same address and module.

So yes I would favor special treatment for storage::Core and also compression techniques for other resource types: perhaps the hash or the full path, whichever is smaller (e.g. 0x1::account::Account is less overhead than a hash). So this should be a third storage-associated AIP:

Reduce resource type global storage overhead

Any extra documentation on how storage gets consumed would be much appreciated (e.g. is there anything else that eats up data in the writeset beyond the BCS serialization and the resource type name?)

I think we should also not worry too much about the 1KB. I think @msmouse and team are already comfortable enough to bump it up a bit.

Any visibility on this would also be greatly appreciated, as the free bytes quota is probably the single-most impactful constant in terms of gas optimization, since it determines, for example, the fanout on a B+ tree (node order) and therefore the number of borrows required during the search process.

Sep 18 '23 12:09 alnoki

Still waking up, but wrote this yesterday https://github.com/aptos-labs/aptos-core/issues/10083

Sep 18 '23 12:09 davidiw

Any visibility on this would also be greatly appreciated, as the free bytes quota is probably the single-most impactful constant in terms of gas optimization, since it determines, for example, the fanout on a B+ tree (node order) and therefore the number of borrows required during the search process.

https://github.com/aptos-labs/aptos-core/issues/10092

Sep 18 '23 16:09 msmouse

This issue is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.

Nov 03 '23 01:11 github-actions[bot]

This issue is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.

Dec 19 '23 01:12 github-actions[bot]

aptos-core aptos-core copied to clipboard

[Framework] Create storage account

aptos-core
aptos-core copied to clipboard