substrate Isolate event serialization in system::Events for easier decoding

Now it's really hard to decode system::Events outside the runtime (e.g. in Subxt, or third party Substrate client like GSRPC).

A very common problem is, the client may miss some type definitions, and some type definitions are out-of-date, and due to the compactness of scale codec, a failure to decode a single event item will make all the remaining events unable to be decoded. As a result, there's a very high chance to cause event decoding failure.

Ideally we'd like to decode all the event types. However it's not very realistic. We must have all the type definition and keep them up-to-date. People usually only care about the events from their own pallets, but the problem is usually triggered by the ones introduced by the FRAME system, especially given their number is large, and the update is very frequent. (Sometimes even dynamic decoder like Polkadot.js fails, no wonder the static typed clients like sub-xt & GSRPC)

Mitigation: Isolate the events

To mitigate, we can change the data structure of Events to allow skipping the events with unknown types.

Now this is impossible, because the serialized events are concatenated as a vector. When decoding a Vec, the size of an element cannot be predetermined until it's decoded since it relies on the type definition of the full event type (E) is complete and accurate:

https://github.com/paritytech/substrate/blob/1d5abf01abafdb6c15bcd0172f5de09fd87c5fbf/frame/system/src/lib.rs#L616

https://github.com/paritytech/substrate/blob/1d5abf01abafdb6c15bcd0172f5de09fd87c5fbf/frame/system/src/lib.rs#L777-L784

A simple fix could be to replace the raw type E by the encoded data of E as Vec<u8>. In this way, each Vec element can be decoded without having the full type definition. And it's up to the user to decide if they will further decode the inner event or not.

Another option is to encode the event data length at the beginning of EventRecord. With a customized deserializer, we can still skip unknown event data. Compare with the above fix, this can save one Vec decoding call and reduce the CPU usage a little bit, but it also introduces customized serialization, and will introduce more burden to the client developers.

Compatibility

The both ways will break the compatibility with the existing ecosystem, especially the blockchain explorers and indexers. However we argue that this is still worth doing. To my knowledge, locating the type related problems and updating the type definition are the most time consuming job to maintain a client library. Our team has spend at least 2-3 days per month to fix problems in subxt & gsrpc.

But if the mitigation is implemented, people can easily filter the events they don't care. This allows developer to only focus on their own events.

Related work: Metadata with scale-codec types

paritytech/substrate#8615 will introduce embedded type definition to the runtime metadata. Ideally it will provide sufficient information to decode all the event data.

It's the best solution for dynamically-typed sdk like Polkadot.js. However, we think it may not be perfect for statically-typed languages like Rust & Go. For the latter ones, developers have to define or generate the types before compilation, and when the chain is upgraded, they will also need to include the changed types in the code, and carefully pick the correct version when decoding. It is especially problematic for the FRAME types.

Alternative mitigation

Another way to mitigate is to create our own event store. Just like System::Events, we can easily create a new StorageValue of a event Vec. We append events during the block execution, and clear it at on_initialize at every new block. This is already adopted by us for our message queue:

https://github.com/Phala-Network/phala-blockchain/blob/2ccca2717609363836b7145cf7fc67cb633b9e66/pallets/phala/src/mq.rs#L39-L44

However, this is not well supported by the ecosystem projects like GSRPC and SubQuery. So it introduces more efforts to develop customized ad-hoc solutions. It also doesn't solve the problem if one still wants to subscribe the extrinsic dispatch result, which is located at system events.

Aug 03 '21 08:08 h4x3rotab

cc @thiolliere

We were already thinking about making a pretty large change to events in Substrate to help with PoV size, and isolate them from other parts of the runtime.

Specifically, creating the following keys:

:CURRENT_EVENT_LEN
:CURRENT_EVENT_LEVEL
:EVENTS_<LEVEL>

And then placing up to MAX_EVENT_LEN data into each level.

With this change, we could also introduce some of the encoding changes to the events as you suggested.

Aug 03 '21 09:08 shawntabrizi

paritytech/substrate#8615 will introduce embedded type definition to the runtime metadata. Ideally it will provide sufficient information to decode all the event data.

It's the best solution for dynamically-typed sdk like Polkadot.js. However, we think it may not be perfect for statically-typed languages like Rust & Go. For the latter ones, developers have to define or generate the types before compilation.

Actually you should be able to use the metadata to dynamically "skip" over non statically defined events/types. This is what subxt does, although not very well at the moment because of the missing type definitions. In fact that was one of the original motivations for doing this work.

I've already done some work on this and I will resume it once I'm back from holiday at the end of the month (and paritytech/substrate#8615 is merged)

Aug 03 '21 09:08 ascjones

More issues about events:

It is not light client friendly. i.e. it can be expensive to proof something happened on a parachain to another parachain
- We have event topics but no one is using it. I bet most of pallet developers does not know such feature exists.
It is hard to judge between too many events vs not enough events e.g. https://github.com/paritytech/substrate/pull/9425
Event parameters are unnamed and we are currently relying on non-standard syntax in docs comment to document it.

I am not sure what's the solution here but we do need a major refactor.

Aug 03 '21 09:08 xlc

cc @thiolliere

We were already thinking about making a pretty large change to events in Substrate to help with PoV size, and isolate them from other parts of the runtime.

Specifically, creating the following keys:
* `:CURRENT_EVENT_LEN`

* `:CURRENT_EVENT_LEVEL`

* `:EVENTS_<LEVEL>`
And then placing up to MAX_EVENT_LEN data into each level.

With this change, we could also introduce some of the encoding changes to the events as you suggested.

In what way should that help the PoV size? And in what way are the events to the PoV size at all? I mean, the only problem is the deletion of the events from the old block, but that stuff is a "bug" in the trie.

Aug 03 '21 09:08 bkchr

Event parameters are unnamed and we are currently relying on non-standard syntax in docs comment to document it.

if I don't missundertand this should be fixed, we can now use named fields for events like:

#[pallet::event]
pub enum Event {
    Foo { bar: u32 },
}

Aug 03 '21 09:08 gui1117

The named Event params will definitely work now with paritytech/substrate#8615.

Aug 03 '21 09:08 ascjones

cc @thiolliere We were already thinking about making a pretty large change to events in Substrate to help with PoV size, and isolate them from other parts of the runtime. Specifically, creating the following keys:
* `:CURRENT_EVENT_LEN`

* `:CURRENT_EVENT_LEVEL`

* `:EVENTS_<LEVEL>`
And then placing up to MAX_EVENT_LEN data into each level. With this change, we could also introduce some of the encoding changes to the events as you suggested.
In what way should that help the PoV size? And in what way are the events to the PoV size at all? I mean, the only problem is the deletion of the events from the old block, but that stuff is a "bug" in the trie.

I opened a issue with the related ideas https://github.com/paritytech/polkadot-sdk/issues/309

Aug 03 '21 09:08 gui1117

I think modifying Event codec implementation so that it is decodable as Vec<u8> makes complete sense, we do that for some types already. It can be done in construct_runtime when generating the runtime event.

It is hard to judge between too many events vs not enough events

Yes it would be great to have some rocksdb benchmark to know what is our constraint here.

It is not light client friendly. i.e. it can be expensive to proof something happened on a parachain to another parachain. We have event topics but no one is using it. I bet most of pallet developers does not know such feature exists.

maybe we should create more function and types around topic to make it usable ? or is there some other direction we should take to improve this ?

Aug 03 '21 10:08 gui1117

is paritytech/substrate#8615 an ok solution all in all ?

Or otherwise it shouldn't be difficult to implement EventRecord codec so it is decodable as a Vec<u8>. Or implement Event codec so it is decodable as a Vec<u8>. What would be the more handy for external tools ? EventRecord or Event or doesn't matter.

Sep 08 '21 13:09 gui1117

Off the top of my head I would say the Event itself, since that is the part that is "dynamic".

As long as the topic type T of EventRecord can be statically defined in external tools.

Sep 08 '21 13:09 ascjones

It seems scale-info metadata should solve this issue, we could also make event decodable as a Vec<u8> which is implemented in this PR https://github.com/paritytech/substrate/pull/9738 but it is an important breaking change. Considering scale-info metadata should already solve this issue, it seems the breaking change is not worth it.

So I'll close the PR, if more people ask for it maybe we can reconsider the PR.

Sep 30 '21 10:09 gui1117

I was pointed at this issue (and paritytech/substrate#9738) after wondering why we are doing a bunch of event decoding (and indeed have issues owing to our decoding not actually 100% working yet) in subxt. I can see that it would cause breakage and that would need to be managed and timed appropriately, but I think it has merit.

Decoding events is tricky and not guaranteed to be successful. Adding the byte length before the event data would make events more consistent with extrinsics (which fortunately do have this prefix iirc), reduce the chance of errors (we don't have to try and decode all events in places like subxt when users only care about handling a few specific ones), and perhaps most importantly reduce the fallout from an error happening (if it's easy to split events, any decode errors can be contained to one event and not screw up decoding of all future events).

I hope that the issue receives consideration in the future, prior to a 1.0 release.

Jan 24 '22 10:01 jsdw

@jsdw @ascjones can you not use scale-info to skip unwanted events?

Jan 24 '22 10:01 bkchr

@bkchr what did you have in mind?

As far as I am aware, given a block of SCALE encoded events, the only way that we can "skip" over an event is to use the type information contained within the metadata to attempt to decode the event. Since different events (even of the same type) can consume different numbers of bytes, attmepting to decode it byte by byte is the only way to know how many bytes the event will ultimately take up, and where the next event will begin.

Jan 24 '22 12:01 jsdw

@jsdw @ascjones can you not use scale-info to skip unwanted events?

Yes we can, that is what we already do. Though as @jsdw says it involves recursing the type definitions and consuming the bytes as we go.

Adding the len prefix would just mean we could delete all that code, and just skip over the len bytes instead.

So as discussed previously: not an absolute requirement, it would just mean a lot less code required by any clients who want to peek into the EventRecord stream of events.

Jan 24 '22 13:01 ascjones

But wouldn't the code for skipping just be something like using decode_use_scale_info and throwing the result away?

Jan 24 '22 13:01 bkchr

This works only if we know about all event types at compile time.

For tooling which should operate on a specific pallet e.g. pallet-contracts, then a subxt API can be generated against a static version of that pallet. As long as the target runtime has a compatible version of pallet-contracts then it should be able to interact with that pallet. For events that means we should be able to decode compile time known events e.g. pallet_contracts::Event and skip over events not necessarily known at compile time e.g. missing and/or different pallet events.

This is currently achieved by downloading the metadata at runtime and dynamically partitioning the stream of Vec<EventRecord> into chunks of bytes, then allowing the user to statically decode the event(s) that we are interested in e.g.

let instantiated = result
            .find_first_event::<api::contracts::events::Instantiated>()?
            .ok_or(anyhow::anyhow!("Failed to find Instantiated event"))?;

This feature is required for cargo-contract, it should operate against any chain with a compatible version of pallet-contracts. Otherwise it would require recompiling for every different chain and possibly different runtime versions too.

I guess the difference with polkadot.js is that it allows downloading the metadata and (re)generating the types at runtime?

Jan 24 '22 16:01 ascjones

We might as well refactor the whole event system... The current one doesn't work for light clients and terrible to generate proof of exist of event for XCM purpose. No one is using the event topic. If we need to make a breaking change, should really make sure it is light client friendly and XCM friendly.

Jan 25 '22 00:01 xlc

substrate substrate copied to clipboard

Isolate event serialization in system::Events for easier decoding

Mitigation: Isolate the events

Compatibility

Related work: Metadata with scale-codec types

Alternative mitigation

substrate
substrate copied to clipboard