helia
helia copied to clipboard
feat: add ipns reproviding
What
This PR adds the functionality necessary for a Helia node to reprovide records it created before expiry.
Why
Currently, the ipns.republish method which runs the republishing loop does not actually handle republishing.
How
- Add the the ability to iterate on records in the data store.
Open question about storing keys
- We don't store the private key when publish is called. This means that we can only reprovide valid IPNS records, but with the current defaults (48 hour lifetime and 23 hour republish interval), that means one round of republishing, after which all stored IPNS records become invalid. -This begs the question of whether we should store the keys so that we can create new valid records for stored IPNS names.
- After looking at how we manage the self key for the libp2p peer we create, it seems like we probably want to add the keychain. The question then is whether to add the keychain to the Helia interface or just the IPNS module.
- Adding a keychain would allow us to iterate on keys rather than IPNS names when republishing. Though I'm not sure if there's any obvious benefit to this.
https://github.com/ipfs/helia/blob/1e20e38f259e2a3a402f7be97e19e26826a01e12/packages/ipns/src/index.ts#L450
I'll wait for feedback from @achingbrain regarding key storage and the necessity of the keychain (see PR description).
Following a discussion offline with @achingbrain, we need to move key management internally to implement reproviding.
Key management is typically (and store encrypted at rest) handled by the js-libp2p keychain, however we don't have libp2p in @helia/http (which can also be used for @helia/ipns with delegated routers). This means that we probably need a dedicated keychain instance in the IPNS class, so that we don't depend on the libp2p instance.
This means that we probably need a dedicated keychain instance in the IPNS class
@helia/http could also use a libp2p instance that's only configured with HTTP routers? Then you don't need to handle things differently at this level.
It'd also be good to see just how lightweight we can make libp2p for this sort of use-case.
@helia/http could also use a libp2p instance that's only configured with HTTP routers? Then you don't need to handle things differently at this level.
Yeah we could try that first and see how it feels. It may increase the bundle size.
I also vaguely remember there being a bug with libp2p whereby instantiating it without the DHT, causes an infinite random walk loop with the defaults, as it's looking for a circuit relay reservation by default.
Would this also allow Helia to make use of the libp2p over http stuff you've be working on?
It may increase the bundle size.
Looking at the breakdown, the usual suspects are already there - @libp2p/crypto, @noble/hashes, etc so it might not increase it by that much.
I also vaguely remember there being a bug with libp2p whereby instantiating it without the DHT, causes an infinite random walk loop with the defaults, as it's looking for a circuit relay reservation by default.
I'd imagine this configuration wouldn't configure a relay listener so it wouldn't try to find a relay?
We need this landing, really - https://github.com/ipfs/specs/pull/476
Would this also allow Helia to make use of the libp2p over http stuff you've be working on?
Yes, though you'd need transports etc in order to use libp2p for the transport layer.
As discussed in https://github.com/ipfs/helia/discussions/807, the next steps here:
- Take named keys (string), persisted in the keychain using a separate namespace) as input to
publish. Keys will be generated internally. - Republish will iterate over records in datastore
- Add
unpublish(‘keyname’)for deleting records - Add the
public keyto the return type ofpublish, because we can't derive the name when the key is managed by the keychain.
I have an open question: we currently store the marshalled DHT record for IPNS records in the datastore.
As part of republishing, we need to also persist some additional local metadata associated with the IPNS record, like the keychain’s keyname (the key name string that the user passes) associated with the record and potentially also the lifetime. Both of these are needed for republishing.
The two ways I’ve explored storing this metadata are:
- Take the current libp2p DHT record we store and make it part of an envelope which also stores some local metadata (lifetime and keychain’s keyname)
- Store the local metadata using a separate key in the datastore specific for ipns metadata.
The latter approach seems a bit easier and less likely to have downstream effects or unintended consequences (like polluting /dht/record/ namespaced keys in the datastore with other data that isn’t part of a DHT record)
Since the datastore expect binary values, what are the conventions around storing such data in binary format? Should I use Protobuf for this or can I use cbor (which we already depend on)
For the sake of the example, let's assume I have the following struct that I want to persiste:
export interface IPNSRecordMetadata {
keyName: string
lifetime: number
}
@achingbrain
Most of the datastore values are protobuf encoded - it's nice because it has predictable decoding behaviour and uses schemas which will protect us from ourselves.
I think storing extra metadata under a separate key prefix is the way to go, the /dht/record/ prefix is populated and read by at least kad-dht and ipns-pubsub so we can't change the data structures stored there, not without a very tedious upgrade process.
Update on this PR
While working on https://github.com/libp2p/js-libp2p/pull/3238, which fixes some bugs in the kad-dht reprovider I realised a couple of things regarding this PR:
- We currently run the republish loop every 23 hours.
- We should at the very least run it every 24 hours, and ideally every 1 hour, ensuring that we only republish records about to expire (either DHT expiry or validity expiry) within a certain threshold.
- The kad-dht implementation runs the loop every hour and reprovides every 24 hours (threshold) https://github.com/libp2p/js-libp2p/blob/main/packages/kad-dht/src/constants.ts/#L17
- The go implementations republishes every 4 hours (https://github.com/ipfs/boxo/blob/7d065b2d7b52d49d58159a801df3dbddfb2d7555/namesys/republisher/repub.go#L29)
@achingbrain Friendly ping for a review