hub-monorepo feat: include hub id in event id

trafficstars

What is the feature you would like to implement? Include an id for a hub in the event id. And if a user provides a fromId with an invalid hub id via a subscribe request, reject the request with an error indicating the event id is from a different hub.

Why is this feature important? The fact that event ids are hub-specific is confusing.

Will the protocol spec need to be updated?? Call out the sections of the protocol spec that will need to be updated (e.g. Section 4.3 - Verifications, Section 3.1 - Identity Systems)

How should this feature be built? (optional) A design for the implementation of the feature

Additional context Add any other context or screenshots about the feature request here.

Mar 21 '23 18:03 pfletcherhill

@pfletcherhill I might take a stab at this. Was there a specific format for hub ID that you had in mind?

Proposal below, but TL;DR:

Reduce timestamp bits used from 41 to 31
Interpret timestamps as seconds since Farcaster epoch (instead of milliseconds)
Use the 10 bits we claimed above as space for the hub ID, derived from a hash of the hub's peer ID

Looking at the existing code, it seems like we'd really like to maintain the property that an event ID can be represented by a JS number, as this requires the fewest changes and no migration of existing hub data.

To avoid needing to migrate existing hub data, we can reduce the 41 bits we allocate to the millisecond timestamp by 10 bits and instead have the value represent milliseconds / 2^10 (which is ~2.3% off from seconds, since 2^10 = 1024) and then take those trailing 10 bits and repurpose them as the hub ID, starting from a particular (e.g. after the May 31st release date of hub version 1.3).

This is backwards compatible because by bit-shifting the millisecond timestamp 10 bits we'll always have a timestamp that represents an "earlier" time than the new interpretation for timestamps created by the old generator, since 2^10 > 1000, and thus the old times end up being seen as having happened earlier.

Event IDs aren't actually timestamps, so this is fine. We just want to ensure that when sorting these events when compared against events generated using the new approach, we continue to order them correctly.

Example:

epoch = 1609459200000;                             // Farcaster epoch (Jan 1, 2022)
millis = new Date('2023-05-16').getTime() - epoch; // 74736000000
seconds = Math.floor(ms/1000);                     // 74736000
(millis >> 10) < seconds;                          // true

Again, the point isn't to be able to extract the original timestamp from the event—it's to ensure that we order event IDs generated using the old technique in a backwards compatible way with events generated using the new technique.

Using the 10 lowest order bits, we can insert a hub ID generated from the first 10 bits of the blake3 hash of the hub's peer ID. This won't be universally unique, but "unique enough" for the purposes of catching errors, which was the stated goal of this ticket.

The downside here is we are increasing the likelihood that we exceed the sequence ID for a given second, i.e. hit the error condition here: https://github.com/farcasterxyz/hub-monorepo/blob/71eef0c5593956e57be1af0800c9e50142d356e9/apps/hubble/src/storage/stores/storeEventHandler.ts#L133-L135

I don't think we need to worry about this too much: 2^12 is 4096 distinct values in a single second before we overflow, and by the time we're at the scale where that becomes a problem we'll be able to make a deeper investment in a different event ID system.

May 16 '23 05:05 sds

,🟩🟩🟩🟩

Sep 05 '23 20:09 Meysamhassani

hub-monorepo hub-monorepo copied to clipboard

feat: include hub id in event id

hub-monorepo
hub-monorepo copied to clipboard