I'd like to revive Desub. Given the various things that have changed in the last couple of years, I'm proposing to restructure and rework the library with the following rough goals:

To expose an as-simple-as-possible interface for decoding arbitrary blocks/state. With the right configuration it should be possible to tie this in with pulling from a DB or from RPCs to index chain state/blocks in some arbitrary way.
To be able to reuse the extrinsic/state decoding logic in eg Subxt (without pulling in a load of unnecessary stuff).
To lean on the modern libraries we have nowadays like scale_value and scale_decode.
Excellent errors, examples, and a CLI utility to help one to iteratively construct/add type mappings for a chain (but we'll provide default mappings from PJS which should get things going).

Some more specific details:

Decoding arbitrary bytes

This is the core thing that needs doing, regardless of metadata version. To be general over how we decode types, TypeDecoder is a trait that can be implemented on anything capable of decoding bytes into outputs.

It might look something like:

trait TypeDecoder {
    /// Something that identifies a type. in V14/V15 metadata
    /// this might be a u32, and in earlier versions it might be
    /// some struct with chain and type name (maybe spec too).
    type TypeId;

    /// Error type we'll return if decoding fails.
    type Error;

    /// Given a type ID and a cursor, attempt to decode the type into a
    /// `scale_value::Value`, consuming from the cursor. The Value will
    /// contain the `TypeId` as context, so that we have information on the
    /// type that the Value came from.
    fn decode_type<V: scale_decode::Visitor>(
        &self,
        type_id: &Self::TypeId,
        bytes: &mut &[u8],
        visitor: V
    ) -> Result<V::Value, Self::Error>;
}

This could be implemented directly on a scale_info::PortableRegistry. For older versions of metadata, we'll need a set of manual type mappings and we'll implement it on that.

Hopefully decode_type can take a scale_decode::Visitor, and will call that as needed. This would allow us to do things like skip over bytes for more decode flexibility, or decode to scale_value::Values or whatever, and generally leverage the Visitor stuff better. In V14 and V15 this would all "just work", but we'd need to implement a visitor based decoder to work with legacy type mappings.

Decoding extrinsics

We can use the type decoder above, as well as something capable of getting the relevant type IDs, to decode extrinsics

/// Provide the type information needed to decode extrinsics. This allows
/// us to write `decode_Extrinsic_v4` once for all metadata versions
trait ExtrinsicTypes {
    /// Expected to be compatible with a `TypeDecoder`, so that we can
    /// use the type IDs to actually decode some things.
    type TypeId;

    // Signature type
    fn signature_type(&self) -> Self::TypeId;

    // Address type
    fn address_type(&self) -> Self::TypeId;

    // Info on how to decode each of the signed extra types.
    fn signed_extra_types(&self) -> impl Iterator<Item = (&str, Self::TypeId)>;

    // Names of each argument and the type IDs to help decode them.
    fn argument_types(&self, pallet_id: u8, call_id: u8) -> impl Iterator<Item = (&str, Self::TypeId)>;

}

// Now, combining the above with a compatible type decoder should let us decode extrinsics.
fn decode_extrinsic<D, E, Id>(extrinsic_types: &E decoder: &D, bytes: &mut &[u8]) -> Result<ExtrinsicDetails<Id>, Error<D::Error>>
where
    D: TypeDecoder<TypeId = Id>,
    E: ExtrinsicTypes<TypeId = Id>
{
    // We should be able to write this logic once, and then reuse it for any V4 extrinsic
}

Potentially decode_extrinsic could take a visitor too to define how the args are decoded, or it just stores byte offsets for things after decoding everything with an IgnoreVisitor and allows the user to then decode each signed extension etc into Values or concrete types or whatever (one might want to decode eg CheckMortality into a concrete type, but take the call data into Values or whatever.

Decoding storage

I imagine that we can follow the same pattern as for decoding extrinsics; a trait to fetch whatever information we need and hand it back in some generic way, and then a decode_storage type call which can decode into some StorageDetails struct.

General structure

We might end up with the following general crate/export structure:

// The core decode logic.
use desub::{
    // The core traits:
    TypeDecoder,
    ExtrinsicTypes,
    StorageTypes,
    // The core functionality we expose is quite simple; start with eg:
    decode_extrinsic,
    decode_storage_entry,
    Error,
    // Define the type mapping stuff that legacy metadatas need to be
    // able to decode types, and impl the above traits on the relevant
    // decoding and old metadatas.
    #[cfg(feature = "legacy")]
    legacy::{
        LegacyTypeDecoder
        // legacy impls for the above live here too; if using this in
        // Subxt we should be able to not include the legacy stuff
    }
    #[cfg(feature = "current")]
    current::{
        // Probably just impls of the traits for V14/V15 metadatas here.
    }
};

After this initial restructuring is done, we might also end up wanting to add:

A desub CLI utility tool which can:
- Scan a node to find where all of the spec version change (or runtime update) blocks are and which metadata version is in use for each spec version (binary chopping to locate the blocks with the diffs shouldnt take toooo long). Could dump this all to a file, and dump the metadatas too. Would be very easy to then scan and decode all blocks (assuming correct type mappings).
- Check whether an RPC connection is an archive node (maybe by querying for block 1 hash and then seeing whether we can get state there)
- Sample and attempt to decode blocks in each different spec version pre-V14 to help a user to construct and validate type mappings and spot errors. (maybe allow to scan in one spec version at a time to help the building of mappings)

We can raise separate issues to implement these bits later.

Nov 08 '23 11:11 jsdw

There's a nice binary search for metadata versions in substrate archive here that finds where different metadata versions switch, if that helps at all. It was quite fast iirc. could be adapted to maybe use the network

Nov 08 '23 14:11 insipx

There's a nice binary search for metadata versions in substrate archive here that finds where different metadata versions switch, if that helps at all. It was quite fast iirc. could be adapted to maybe use the network

Ah lovely; thanks for pointing that out!

Nov 08 '23 15:11 jsdw

Thoughts on legacy type decoding:

I'd like to separate the spec/chain stuff from the core type decoding, so that the basic type mappings are just a map from pallet + type name to type marker.

Then, when we build the default type mappings, we can configure them to be for a specific chain/spec, which will load in the relevant mappings.

This pushes the "which chain/spec version" question to a higher level; at this level we just have some bytes, some type mappings and a type ID, and want to decode the bytes accordingly. Any interface can be built on top of this to handle multiple spec versions or chains or whatever else.

It should be easy to override type mappings on a case by case basis (imagine a Vec<Mapping> stack; we'll try the mappings in the last entry first, working our way back. Maybe we can just implement TypeDecoder on a composite type to do this).

The API might look like:

// build the default mappings for polkadot for some spec version:
let mappings = TypeMappings::builder()
    .chain("polkadot")
    .spec_version(1030)
    .build();

// we can insert new types into the above, or have some new mappings:
let mut overrides = TypeMappings::new();
overrides.insert("Balance", Type::u32());
overrides.insert("AccountId", Type::array(Type::U8, 32));
// Need to be able to insert for a specific pallet too sometimes. Maybe the "key"
// is like `Key { pallet: Option<String>, name: String }` and can be From string or (string,string).
// Or maybe we have a `insert_for_pallet` call.
overrides.insert(("balances", "Foo"), Type::u64());

// or (and this is why a Key + From impl might be nicer, so we can allow this):
let overrides = TypeMappings::from_iter([
    ("Balance", Type::u32()),
    ("AccountId", Type::array(Type::U8, 32)),
    ("Foo", Type::alias("Balance")),
]);

// There's a notion of "fallback types" sometimes in PJS; types to try to decode some value into if decoding
// into the original type fails for some reason (as I understand it at least). So, incase it's useful, 
// We'll want to be able to set N fallback types for a given type too. Perhaps this is done in the same insert call,
// and we can provide one or multiple Types (overrides to work in same order as merged mappings, ie back to front?):
overrides.insert("Bar", [Type::alias("TryThisNext"), Type::alias("TryThisFirst")]);

// This probably resolves types into a `Types(Vec<Type>)` type thing, which might itself have a nice API to make
// it clear which order things are processed in.

// Decode a type from the base mappings:
use desub::TypeDecoder;
mappings.decode_type(...);

// Possibly Arrays/Vecs of TypeMappings implements `TypeDecoder` too, so it's easy to add
// whole sets of overrides (push/pop from end of vec for instance), and user can make similar as needed:

let merged_mappings = [mappings, overrides];
merged_mappings.decode_type(...);

let merged_mappings = vec![mappings, overrides];
merged_mappings.decode_type(...);

This Type thing (type marker) describes the shape of a type (like a simple version of scale_info::TypeInfo), and can be parsed from a string (to load in PJS definitions for instance). There's probably a better name for it.

A TypeId is also present, and here would be an opaque wrapper around a pair of strings denoting the pallet and type name we want to try and decode.

We'll probably support some sort of file format so that we can load in the PJS definitions a little more easily, but I'm not bothered if we leave that for a higher level, and here just have the ability to programmatically insert whatever is needed.

Much of the logic for these things exists in one form or another, so I suspect we can reuse or draw a lot of inspiration from it.

Nov 08 '23 16:11 jsdw

I started working on something like what was described above; a desub-core crate which exposed a couple of traits which could be implemented to decode values using either V14 or legacy type information. However, I came to the realisation that with this approach, we'd end up duplicating a lot of the type decoding logic already present in scale-decode. This led me to think more about this, and I arrived at the following.

Let's first look at how things are right now:

desub-before(1)

scale-info defines the format for our type information, which is used in V14+ metadata.
scale-decode uses this type information to help decode bytes into arbitrary types.
scale-encode uses the type information to help encode arbitrary types into bytes.
scale-value provides a generic Value type, which can represent any SCALE encoded bytes, and uses scale-encode and scale-decode in order to decode bytes into a Value, or encode a Value to bytes. It also has string and serde representations.
subxt uses all of these to drive encoding and decoding things as needed when interacting with chains.
frame-metadata defines the shape of the metadata that substrate based nodes can provide. Given some bytes representing the metadata given by a chain, we can decode it into some metadata here.

What I'd propose (and I've gone back and forth on this a bit, so it's still liable to changing a bit) is this:

desub-new

Yellow represents crates with significant changes, and green represents new crates.

Let's talk through the changes:

Step 1: Introduce a `scale-core` (name tbc) crate. This crate defines a trait like:

pub trait TypeInfo {
    type TypeId<'a>;
    fn get_type_info(&'_ self, type_id: &Self::TypeId<'_>) -> Option<TypeMarker<'_>>;
}

pub enum TypeMarker<'a> {
    SequenceOf(TypeMarker<'a>),
    ArrayOf { value_ty: TypeId<'a>, len: usize },
    // ...
}

This trait can be implemented on anything that, given some type identifier, returns a type marker which describes the shape of the type sufficiently enough to be able to decode (or encode) bytes into it.

We might have to handle "type overrides". Perhaps adding a TypeMarker::OneOf(TypeId, TypeId) type variant would give us this ability.

In theory, scale-info should be able to include this crate and implement TypeInfo on PortableRegistry (where TypeId = u32). To begin with, scale-core will prob depend on scale-info and impl the trait for it.

Step 1.5: Modify `scale-decode` (and `scale-encode`?) to make use of this.

scale-decode is currently tied to using a PortableRegistry and type IDs of u32s. It resolves types in the registry in order to see what their shape is.

What we'd like is for decoding to be generic over this TypeInfo trait. This allows us to decode and encode types using scale-info::PortableRegistry as we do now, but also lets us plug in a totally different sort of type info to drive decoding. It means that we can have one set of decode logic for everything.

This should probably be done in tandem with step 1 to validate that step 1 provides all of the necessary info.

We may want to update scale-encode to use this as well for consistency. There isn't really a need to be able to encode into types using historic data (that I can think of), but it would allow us to be able to decode types from eg multiple versions of scale-info for instance, if we ever did significant updates to our current type information stuff.

Step 2: Create `scale-info-legacy` crate.

scale-info defines the format for our type information post V14.

scale-info-legacy will define a TypeMapping struct which represents the type mappings (ie names to type markers) needed to decode pre-V14 metadata. Type mappings can represent differences between spec versions and chains. Type mappings can be decoded from some JSON format. Ultimately, to use a type mapping, you may need to provide a specific spec version (and perhaps chain), and at that point you can use the result to decode types by name.

It probably should also contain some predefined type mappings for eg polkadot relay chain.

Type mappings should be easy to extend, ie we may provide substrate-node defaults too, and then users can provide their own for their specific chains and extend them.

Step 3: Create `desub` crate.

After step 2, We have all of the machinery in place to decode old or new types.

The goals of the desub crate are:

To provide the logic for decoding extrinsics (this logic may use a trait like ExtrinsicTypes above, combined with the TypeInfo trait).
To provide the logic for decoding storage values (this logic may use a trait like StorageTypes, similar to ExtrinsicTypes, combined with the TypeInfo trait).
To provide a high level interface to then do this very easily for any version of metadata (and legacy TypeMappings where applicable).

The interface might end up looking like (this is changed from the previous post):

use desub::{
    // Traits for getting information about extrinsic/storage value shape from metadata
    ExtrinsicTypes,
    StorageTypes,
    // The core functionality we expose is quite simple; start with eg:
    decode_extrinsic,
    decode_storage_entry,
    // I expect we still need an error type for specific extrinsic/storage decode issues:
    ExtrinsicDecodeError,
    StorageDecodeError,
};

Where we'd have eg:

use scale_decode::Visitor;
use scale_core::TypeInfo;

// General purpose low level interface that works with anything applicable:
fn decode_v4_extrinsic<T: TypeInfo, E: ExtrinsicTypes, V: Visitor>(type_info: &T, extrinsic_types: &E, bytes: &[u8], visitor: V) -> ExtrinsicDecodeError;
fn decode_storage_entry<T: TypeInfo, S: StorageTypes, V: Visitor>(type_info: &T, storage_types: &S, bytes: &[u8], visitor: V) -> StorageDecodeError; // needs more thought

// We could maybe provide concrete higher level functions too to hint at how it works / provide a simple entry point:
mod v14 {
    fn decode_v4_extrinsic<V: Visitor>(metadata: &RuntimeMetadataV14, bytes: &[u8], visitor: V) -> ExtrinsicDecodeError {
        crate::decode_v4_extrinsic(metadata.types(), metadata, bytes, visitor)
    }
    fn decode_storage_entry<V: Visitor>(metadata: &RuntimeMetadataV14, bytes: &[u8], visitor: V) -> StorageDecodeError { // needs more thought
        crate::decode_storage_entry(metadata.types(), metadata, bytes, visitor)
    }
}
mod v13 {
    fn decode_v4_extrinsic<V: Visitor>(metadata: &RuntimeMetadataV13, type_mapping: &scale_legacy::TypeMapping, bytes: &[u8], visitor: V) -> ExtrinsicDecodeError {
        crate::decode_v4_extrinsic(type_mapping, metadata, bytes, visitor)
    }
    fn decode_storage_entry<V: Visitor>(metadata: &RuntimeMetadataV13, type_mapping: &scale_legacy::TypeMapping, bytes: &[u8], visitor: V) -> StorageDecodeError { // needs more thought
        crate::decode_storage_entry(type_mapping, metadata, bytes, visitor)
    }
}
// ...

Note that subxt would want to end up depending on desub so that it doesn't need to duplicate any logic for decoding extrinsics/storage values. For this to be effective, desub may want feature flags to hide the legacy stuff, or we might end up deciding on putting the core "extrinsic and storage decoding + traits" into a desub-core crate or something (though I suspect we'd still need feature flags, so one desub crate for everything might be best).

Alternative approach

The approach that I originally had in mind (see previous comments) was to have a general TypeDecoder trait that could be implemented for scale_info::PortableRegistry, as well as our TypeMapping struct which defines legacy types.

This has the advantage that it's super general, and doesn't care at all about the shape of the "type registries" and how to get type information etc. But, with just a TypeDecoder trait, you'd end up needing to write the logic to decode SCALE types more than once; ie you'd use the existing scale-decode logic to handle decoding types using V14 metadata, and then write similar decode logic to handle decoding types using legacy type mappings.

The approach I went with above instead focuses on there being a single "decode" function that can be given generic enough data (here, a TypeInfo impl and a corresponding TypeInfo::TypeId) and return a general description (here, a TypeMarker). This allows there to exist just one method to decode a type, or an extrinsic, or a storage value.

Ie in code:

////// "old" approach (see previous comments)

use desub::TypeDecoder

// TypeDecoder is defined on v14 PortableRegistry, so that can decode types now:
let v14_decoded_value = v14_registry.decode_type(type_id, bytes, visitor)?;
// it's also defined on our legacy "type mappings", so that can also decode types:
let v13_decoded_value = v13_type_mappings.decode_type(type_id, bytes, visitor)?;

////// "new" suggestion (this comment)

use scale_core::TypeInfo;

let v14_registry = // ...
// v14 registry impls `TypeInfo` so can help drive decoding via single function:
let v14_decoded_value = scale_decode::decode(v14_registry, type_id, bytes, visitor)?;

let v13_type_mappings = // ...
// v13 type mappings also impl `TypeInfo` so can help drive decoding via single function:
let v13_decoded_value = scale_decode::decode(v13_type_mappings, type_id, bytes, visitor)?;

If it seems useful, we could end up with both approaches, ie we do as described in this comment, and then in desub define a TypeDecoder too which can, under the hood, use the scale_decode::decode + scale_core::TypeInfo to make it super easy for impls to decode stuff using a common approach, but allows for somebody to plug in a totally different approach above it in desub. I'm not sure that this will be useful offhand, and it can be added in a future version if it is.

Summary

Ideas like this are always prone to changing, but this is the direction that I'd like to start aiming towards (unless anybody spots any glaring flaws), and so I wanted to get it in writing. We can see how it pans out and adjust the plan as we go along if we run into any unforseen wrinkles!

Jan 22 '24 17:01 jsdw

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/the-path-towards-decoding-historic-blocks-storage-in-rust/6373/1

Feb 21 '24 14:02 Polkadot-Forum

Desub V2

Decoding arbitrary bytes

Decoding extrinsics

Decoding storage

General structure

Step 1: Introduce a scale-core (name tbc) crate. This crate defines a trait like:

Step 1.5: Modify scale-decode (and scale-encode?) to make use of this.

Step 2: Create scale-info-legacy crate.

Step 3: Create desub crate.

Alternative approach

Summary

Step 1: Introduce a `scale-core` (name tbc) crate. This crate defines a trait like:

Step 1.5: Modify `scale-decode` (and `scale-encode`?) to make use of this.

Step 2: Create `scale-info-legacy` crate.

Step 3: Create `desub` crate.