penumbra icon indicating copy to clipboard operation
penumbra copied to clipboard

cometindex and pindexer: defining tables for exploring block data

Open ejmg opened this issue 8 months ago • 2 comments

Summary

Following the discussions and work from #4573, #4604, #4610, and #4611, Penumbra's builtin indexer should define and provide tables that allow for efficient querying of common data that is used by block explorers like cuiloa.

Example of the problem now

Using cuiloa as a motivating example, you can find all currently used queries in the respective route.ts file of a given endpoint in the api directory of cuiloa, i.e. when someone looks up a specific IBC Channel that has been created by an IBC Client on Penumbra, this is the query that gets ran to collect the relevant information.

Putting aside query optimizations, the default index schema used by cometbft requires pulling together data that requires a lot of nested reads within a single table in addition to filtering across several relations to collect current information for e.g. a specific IBC identifier and any associated data (block, transactions, etc).

In short, it's not too bad to look up a single thing (does this block exist? Is there a nullifier with this specific value?) but the complexity and overhead of a query quickly escalates for each additional piece of information you want to find beyond that.

Suggested Solution

The desired outcome for this feature request is for penumbra's indexer to provide tables that drastically improve the ergonomics and efficiency of querying data. By using Cuiloa's current queries as a baseline, we can define tables that achieve this outcome for both Cuiloa and future alternative block explorers and related tooling.

Immediately below the following schema outlines you will also fine caveats and open questions that may be useful to skim over first.

  • [ ] clients(rowid, client_id, client_type, block_id, height, tx_id, tx_hash)

    tracks existing light clients defined on Penumbra.

    • rowid table index.
    • client_id is the same value for the key client_id found for any {create|update}_client type event from event_attributes as defined in the cometbft indexer schema, e.g. 07-tenderminet-0
    • client_type the type of the client. This is the same value in an event_attributes row with type of {create|update}_client and key client_type, e.g. 07-tendermint
    • block_id the id of either the creation block or the block with the last client_update event for the respective client. If a client has only been created with no updates, then it will be the former. If it has received updates, it will be the latter.
    • height is the height of the block identified with block_id
    • tx_id is the id of the transaction of either the creation or last update of the client. Same as block_id.
    • tx_hash is the hash of the transaction identified with tx_id
  • [ ] connections(rowid, connection_id, client_id, client_idx, counterparty_client_id, counterparty_connection_id)

    tracks existing IBC connections on Penumbra

    • rowid table index
    • connection_id is the same value for the key connection_id found for any connection_open_{init|ack} type event from event_attributes with a key of conneciton_id
    • client_id is the clients.client_id of the client that the connection belongs to.
    • client_idx is the clients.rowid of the client that the connection belongs to.
    • counterparty_client_id is the counterparty client id and the same value you will find for an event_attributes row with connection_open_{init|awk}.counterparty_client_id composite_key
    • counterparty_connection_id is the connection_id of the connection on the counterparty chain. this is the same value found in event_attributes with the composite key connection_open_{init|awk}.counterparty_connection_id
  • [ ] channels(rowid, channel_id, connection_id, connection_idx, client_id, client_idx, port_id, counterparty_channel_id, counterparty_port_id)

    tracks existing IBC channels on Penumbra

    • rowid table index
    • channel_id is the same value for the key channel_id found for any channel_open_{init|ack} type event from event_attributes with a key of channel_id
    • connection_id is the connections.connection_id of the connection the channel belongs to. The same value found in any channel_open_{init|ack}.connection_id in `event_attributes.
    • connection_idx is the connection.rowid
    • client_id is the clients.client_id of the client that the channel belongs to.
    • client_idx is the clients.rowid of the client that the channel belongs to.
    • port_id is the port of the channel and is the same value you will find for any channel_open_{init|ack}.port_id composite_key event in event_attributes.
    • counterparty_channel_id is the counterparty channel id and the same value you will find for an event_attributes row with channel_open_{init|awk}.counterparty_channel_id composite_key
    • counterparty_port_id is the port_id of the channel on the counterparty chain. this is the same value found in event_attributes with the composite key channel_open_{init|awk}.counterparty_port_id
  • [ ] client_events (rowid, client_id, client_idx, type, block_id, tx_id, header, consensus_height)

    tracks all client_create and client_update events

    • rowid table index
    • client_id the clients.client_id of the client event
    • client_idx the clients.rowid of the client event
    • type the type of the client event, i.e. client_create or client_update. client_update ought to make the overwhelmingly majority of these events.
    • block_id the blocks.rowid of the block this event occurred on
    • tx_id the tx_results.rowid of the transaction
    • header is the IBC Header segment when the type is update_client
    • consensus_height the consensus height of the counterparty chain for the given client event. This is the same value for any row in event_attributes with the composite_key of {create|update}_client.consensus_height.
  • [ ] packet_events (rowid, type, client_id, client_idx, block_id, tx_id, connection_id, connection_idx, channel_id, channel_idx, port_id, ordering, sequence, counterparty_channel_id, counterparty_port, timeout_height, timeout_timestamp, data, data_hex)

    tracks all send_packet and recv_packet events.

    • rowid table index
    • type the type of the packet event, i.e. send_packet or recv_packet.
    • client_id the clients.client_id of the packet.
    • client_idx the clients.rowid of the packet.
    • block_id the blocks.rowid of the packet.
    • tx_id the tx_results.rowid of the packet.
    • connection_id is the connections.connection_id of the connection that the packet belongs to. The same value found in any row with the composite_key {send|recv}_packet.packet_connection in event_attributes
    • connection_idx is the connection.rowid of the connection that the packet belongs to.
    • channel_id is the channel.channel_id of the channel that the packet belongs to.
      • for recv_packets, this is the value found for events with the composite_key recv_packet.packet_dst_channel.
      • for send_packets, this is the value found for events with the composite_key send_packet.packet_src_channel.
    • channel_idx is the channel.rowid of the channel that the packet belongs to.
    • port_id is the channel.port_id of the channel that the packet belongs to.
      • for recv_packets, this is the recv_packet.packet_dst_port.
      • for send_packets, this is the send_packet.packet_src_port.
    • ordering the ordering of the packet, the same value for the composite_key {send|recv}_packet.packet_channel_ordering in event_attributes
    • sequence is the packet_sequence key for a given {send|recv}_packet type event.
    • counterparty_channel_id is the counterparty channel_id.
      • for recv_packets, this is the value of the packet_src_channel key
      • for send_packets, this is the value of the packet_dst_channel key
    • counterparty_port_id is the counterparty port ID.
      • for recv_packets, this is the value of the packet_src_port key
      • for send_packets, this is the value of the packet_dst_port key
    • timeout_height the packet timeout height. this is the value of the key packet_timeout_height.
    • timeout_timestamp the packet timeout stamp. this it the value of the key packet_timeout_timestamp.
    • data is the packet data. this is the value of the key packet_data
    • data_hex is the packet data in hex form. this is the value of the key packet_data_hex

Caveats

  • the table schemas try to exchange normalization for quick and easy access to data but how this data will look in the wild is something that should probably heavily influence their definitions.
  • I don't have a perfect grasp of the IBC spec and how Penumbra integrates. As an example, is it possible for an IBC related event on Penumbra to be a block event (i.e. no transaction)? Similarly, what are the plans for supporting IBC operations like closing channels and connections?

Open questions Building off of the caveats, there are several issues that I'm not entirely confident on and the suggested schema may fail to address.

  1. Is keeping clients with an updated block_id, tx_id, etc a desirable property or should this be siloed entirely in client_events?
  2. In a similar vein, if it's not a bad idea to update clients for the most recent block_id, tx_id, etc, would it be reasonable to add this to both connections and channels?
    • every send_packet and recv_packet type event would require a write here
  3. packet_events will probably be a very big and wide table. How exactly this should be managed is not clear to me. Breaking it by type so that there are two recv_packets and send_packets tables instead might not be as bad but maybe tables that store data values like packet_data might be required?
  4. write_acknowledgements are a type of event that I have not addressed but only because I haven't included them in the block explorer thus far. This is probably something worth acknowledging.
  5. As with write_acknowledgements, the same goes for events of type action_delegate. This would be a simple table, however, action_delegate(block_id, tx_id, key, value) if it's something that the penumbra indexer should be tracking.
  6. I am not sure how useful the pattern of foo_id and foo_idx is for the above tables.
    • The idea is to save a possible join for each pair by keeping the ID for a given client, connection, channel on the table row for a given entity.
    • Similarly, indexing directly on IDs is probably(?) not possible. If I understand correctly, client_ids should always be unique to mitigate against replay attacks but I am not sure the same can be said for the semantics of connection_ids and channel_ids

ejmg avatar Jun 14 '24 21:06 ejmg