zeitgeist icon indicating copy to clipboard operation
zeitgeist copied to clipboard

Investigation: How to safely remove outcome assets ("garbage collect") over multiple blocks?

Open Chralt98 opened this issue 2 years ago • 3 comments

Problem

Zeitgeist should provide a fast environment, in which outcome assets get created and unused ones get efficiently destroyed.

Currently Substrate uses a Base-16 Modified Merkle Tree ("Trie") data structure above a key-value database. That's why each storage read/write, in this case the deletion of an outcome asset, takes O(log(n)), where n is the number of nodes in the merkle tree. Therefore storage accesses are more expensive than a non-tree key-value database. However, to save balances for traders in a prediction market, we use a StorageDoubleMap from orml_tokens. To properly delete all balances for a specific currency/asset, we are left with the usage of the storage API.

The current challenge is, that the orml_tokens StorageDoubleMap maps from AccountId to CurrencyId, which requires us to iterate over all accounts, which is unsustainable. That's why we tried to cache all accounts, which have a specific currency id. This adds extra storage and expensive reads and writes and will not scale well in the long run, when the storage layers are not switched. There is no change in sight, which will allow us to have a StorageDoubleMap from CurrencyId to AccountId, because orml_tokens is optimised for the query of one specific AccountId rather than to get one specific CurrencyId.

In addition, orml_tokens is really important for Zeitgeist, because it will allow to handle tokens from other chains (like aUSD) in the way as pallet_balances does (with NamedMultiReservableCurrency or MultiLockableCurrency). This needs to get pointed out, when we might want to use pallet_assets (maintained by Parity), which does not have all the functionality of orml_tokens. In there, one can use a destroy function, which is exactly what we need (at first thinking). Therefore I tried theoretically the use of pallet_assets for the outcome assets, but found out, that too many accounts can't get deleted (info down below). The need to use orml_tokens for the xcm tokens persists. When we might consider a solution with another pallet or custom implementation, we need to decouple the outcome asset logic from orml_tokens and migrate it to the other pallet / custom implementation. I would strongly suggest that, because Zeitgeist depends on an efficient way to delete the unused / finished outcome assets. I know, that the ideal solution to that problem should take O(1) time, but then we shouldn't use the tree layer from Substrate and would need to implement or use another storage layer just for the outcome asset information (asset_id -> account -> balance).

Try with pallet_assets pallet_assets provides a do_destroy function, which has takes a DestroyWitness as argument and returns the resulting DestroyWitness. One could specify the number of accounts, which is the maximum limit for deletions. If the accounts of the storage map Asset exceed the as parameter specified number, the extrinsic fails. The accounts number of the storage map Asset increases here. This means, that we can't really use do_destroy, because an asset id could have too much accounts for one block. do_destroy only allows to destroy all account information or to destroy nothing and return BadWitness if the witness parameter accounts is too small. When we specify it to u32::MAX and there are u32::MAX accounts for a specific asset id, the blockchain has to do too much in one extrinsic. I created an issue for this here.

However, in one or another way, we need to iterate and remove a map structure in substrate. To remove items from a StorageDoubleMap, one can use drain_prefix or clear_prefix. drain_prefix would be important, if we'd like to iterate over each element before we remove it. clear_prefix just takes care, that the elements get removed without looking at every deleted element. Because the number of accounts, who hold balances for a specific outcome asset is unbounded, but a blockchain is a constrained environment, we need to limit the number of deletions. This leads us to distribute the required effort over multiple blocks.

First Solution Approach

The clear_prefix function serves us very well in the regard, that we'd like to know, when to stop the deletion process for a specific first_key (in our case outcome asset id).

Once the resultant maybe_cursor field is None, then no further items remain to be deleted..

This requires us also to block each possible item insertion for this particular asset id. So in example: No owner of an outcome asset, which was from a resolved market, should be able to trade or mint the related outcome assets for this market.

NOTE: After the initial call for any given map, it is important that no further items are inserted into the map which match the first_key. If so, then the map may not be empty when the resultant maybe_cursor is None.

Then we have the cursor function parameter, which points to the location, where to continue removing.

None should only be passed once (in the initial call) for any given storage map and first_key. Subsequent calls operating on the same map/first_key should always pass Some, and this should be equal to the previous call result’s maybe_cursor field.

Example usage of clear_prefix:

#[pallet::storage]
pub type GarbageAssets<T: Config> = StorageValue<
    _,
    BoundedVec<AssetId, T::GarbageAssetsLimit>,
    ValueQuery,
>;

/// The cursors required to delete outcome assets over multiple blocks. 
#[pallet::storage]
pub type GarbageAssetCursors<T: Config> = StorageMap<
    _,
    Blake2_128Concat,
    AssetId,
    Vec<u8, alloc::alloc::Global>,
    OptionQuery,
>;

...

let mut garbage_assets = <GarbageAssets<T>>::get();
if garbage_assets.is_empty() {
    return;
}
// use a config constant here instead of 500
let mut iteration_limit: u32 = 500;
let cleared_indices = Vec::new();
let mut index_counter = 0u32;
for asset_id in garbage_assets {
    let maybe_cursor: Option<&[u8]> = <GarbageAssetCursors<T>>::get(asset_id).map(|x| x.as_slice());
    let removal_results: MultiRemovalResults = <Accounts<T>>::clear_prefix(asset_id, iteration_limit, maybe_cursor);
    iteration_limit = iteration_limit.saturating_sub(removal_results.loops);
    if let Some(cursor) = removal_results.maybe_cursor {
        <GarbageAssetCursors<T>>::insert(asset_id, cursor);
    } else {
        <GarbageAssetCursors<T>>::remove(asset_id);
        cleared_indices.push(index_counter);
    }
    if iteration_limit.is_zero() {
        break;
    }
    index_counter = index_counter.saturating_add(1u32);
}

for i in cleared_indices {
    match garbage_assets.get_mut(i) {
        // handle remove gracefully
        Some(_) => garbage_assets.remove(i),
        None => log::error!("The index {} was not found in the garbage_assets vector. This should never happen.", i),
    }
}

<GarbageAssets<T>>::put(garbage_assets);

The above is a possible solution for the "garbage collection" over multiple blocks. In that case we need storage items to save the GarbageAssets and the GarbageAssetCursors.

Approach to query the remaining block weight Ideally we'd like to "garbage collect resolved outcome assets", whenever there is space left in a block, because that approach wouldn't block the usage of the chain just for the "garbage collection". One can think about querying the block weight in on_finalize. On the other hand we could delete an equal portion of assets each block in on_initialize. But the best solution all in all is to use the hook on_idle, which has got the remaining_weight and is called before the on_finalize hook. Keep in mind, that on_idle and also on_finalize belongs to the mandatory dispatch class, which allows an "unlimited" block weight. Therefore we are responsible to correctly calculate the weight, which we we'd like to put into with the garbage collection.

First Approach to calculate the remaining weight:

let dispatch_class = frame_support::weights::DispatchClass::Operational;
let consumed_weight = *<frame_system::Pallet<T>>::block_weight().get(dispatch_class);
let weight_left: Weight = match T::BlockWeights::get().get(dispatch_class).max_total {
    Some(max_weight) => max_weight.saturating_sub(consumed_weight),
    // There is no `max_total` limit (`None`),
    // or we are below the limit.
    _ => 0u64,
};

Best approach (use the pallet hook on_idle):

fn on_idle(n: BlockNumber, remaining_weight: Weight) -> Weight {
    let (iterations_number, cleared_assets_number) = do_garbage_collection(remaining_weight);
    T::WeightInfo::do_garbage_collection(iterations_number, cleared_assets_number)
}

Extrinsic approach to avoid the additional storage information of cursors and garbage assets, Zeitgeist could provide an extrinsic, which has the garbage asset id parameter as input. In that case the extrinsic could check if the specified asset id belongs to a resolved market (and therefore counts as garbage asset). In that case, the information about the garbage assets is not required on-chain (same with the cursors). Zeitgeist could call this extrinsic, whenever there is enough weight left for a block. It could be an DispatchClass::Operational extrinsic to allow more block weight. It could take no fee in the extrinsic header.

#[pallet::weight((5000, DispatchClass::Operational, Pays::No)))]
#[transactional]
pub fn do_garbage_collection(
    origin: OriginFor<T>,
    market_id: T::MarketId,
    asset_id: Asset<T::MarketId>,
) -> DispatchResult {
    ensure_root(origin)?;
    // TODO: check if corresponding market assets are ready for garbage collection
    let mut garbage_left = false;
    let mut drain_counter = 0u32;
    for _ in <Accounts<T>>::drain_prefix(asset_id) {
         drain_counter += 1;
         if drain_counter >= T::RemoveKeysLimit::get() {
              garbage_left = true;
              break;
         }
    }
    if garbage_left {
         // emit event, that items are still present to get garbage collected
    } else {
         // emit event, that all items got removed
    }
    Ok(())
}

Chralt98 avatar Sep 14 '22 07:09 Chralt98

We currently remove the deletion of outcome assets in this PR https://github.com/zeitgeistpm/zeitgeist/pull/806 .

@sea212 Gave a great explanation how to remove the orml_tokens outcome assets in a storage migration.

The market will be deleted completely out of storage later and we won't know which outcome assets to delete in the future. I think this is fine though, since we can deduct which which markets got deleted (we got incremental market ids and some won't return a market -> delete markets) and then iterate through all accounts in a (potential multi-block) migration and delete outcome assets where the market id belongs to one of the deleted markets.

Chralt98 avatar Sep 29 '22 10:09 Chralt98

pallet_assets improvement https://github.com/paritytech/substrate/pull/12310

vivekvpandya avatar Oct 11 '22 05:10 vivekvpandya

Some examples of multi block execution:

  • Runtime Task Executor: https://github.com/paritytech/substrate/pull/8197
  • fast-unstake pallet: https://github.com/paritytech/substrate/tree/master/frame/fast-unstake
  • pallet-contracts: https://github.com/paritytech/substrate/blob/master/frame/contracts/src/lib.rs

I propose we design a multi-block execution pallet that provides an interface that other pallets can use to schedule multi-block executions. The calling pallet provides the function (which must contain weight as a parameter and return if it completed) and the multi-block execution pallet regards all the registrations and schedules them during on_idle.

sea212 avatar Oct 12 '22 08:10 sea212

pallet_assets does currently have not the exact same functionality as we have with orml_tokens now.

Look at the current API for pallet_assets https://docs.rs/pallet-assets/latest/pallet_assets/index.html

There you can see that we are missing some kind of reserve functionality to deposit a bond. I added an issue here on the substrate repository.

My current plan: We wait and deal with the problem when we need to scale (my preferred option) and potentially get the functionality added along the way in the pallet_assets module. If it is really required to clean the cache of dead assets, then we can manually do a storage migration or we implement additional caching like Vivek did in his PR.

Chralt98 avatar Jan 28 '23 14:01 Chralt98

The market will be deleted completely out of storage later and we won't know which outcome assets to delete in the future. I think this is fine though, since we can deduct which which markets got deleted (we got incremental market ids and some won't return a market -> delete markets) and then iterate through all accounts in a (potential multi-block) migration and delete outcome assets where the market id belongs to one of the deleted markets.

Chralt98 avatar Feb 03 '23 08:02 Chralt98

Solved in https://github.com/zeitgeistpm/zeitgeist/pull/1201

sea212 avatar Feb 13 '24 07:02 sea212