go-libp2p-core
go-libp2p-core copied to clipboard
peerstore: model address labels
This PR introduces an AddressProvenance data type that acts like an unsealed enum. This means that users can add user-defined values, and have them tracked by the peerstore.
Address provenances are single-byte values. The range [0x00, 0x80) is reserved for libp2p, and we ship with five values (from lowest to highest precedence): unknown, third party, untrusted, trusted, manual.
The range [0x80, 0xff] is at the disposal of users.
The proposal to incorporate this feature in the peerstore APIs consists of adding: ...peerstore.ReadOption and ...peerstore.WriteOption arguments to peerstore methods, to record the provenance when inserting addresses, and filter by provenances when querying and filtering addresses.
That interface change, coupled with the unknown default value, would allow users to stay non-breaking. Implementations of the AddrBook interface would break; but AFAIK, we are the only implementers, so the shockwave is greatly absorbed.
~I'm not sure why we need a registry for this. Can't we just store strings? If it becomes an issue, we can easily create a compression table by just storing a list of used provenance in the peerstore.~ edit: is the registry just for convenience? I.e., it's not a global registry? edit edit: it totally is. I thought we were going to try to maintain a global registry. This makes a lot more sense
I'm also not sure how this solves the problem. Could you explain the motivation for provenance? Do applications need to be able to add additional information to addresses? If so, I'm not sure if we need to address that now. The pressing issue now is that we need to decide which addresses to dial and/or keep.
From where I stand, I believe the main thing we need to be able to do is rank addresses for both dialing and garbage collection. I'd classify addresses into two categories, certified and uncertified, and three uncertified classes, user/application/gossip.
-
Certified: Signed peer routing records.
-
Uncertified: Addresses that have not been certified by the peer.
- User: Explicitly specified by the user.
- Application: Explicitly specified by the application. We could consider having multiple levels of application?
- Gossip: Random addresses we get from the network, etc. We could also have multiple levels?
-
When dialing, we'd try user, then application + certified, then maybe a few gossiped addresses.
-
When exchanging peer information, we should exchange at least the certified addresses. When exchanging uncertified addresses, other nodes should treat them as gossiped.
-
Applications may choose to exchange uncertified addresses over other protocols. When they do this, they can choose to record them as "application".
@Stebalien I think I could've explained some aspects better in the description, but I had to rush through. We are very much aligned. The registry is effectively a compression table, and it uses Golang type acrobatics to avoid runtime costs in string representation. Note that enum values need to remain constant across restarts, to preserve semantics when using the datastore-backed peerstore.
The categorisation you propose is very similar to what's in the out-of-the-box table, with a few caveats:
- In my view, certification is one way of acquiring trust. I went for modelling "trusted"/"untrusted" instead, but happy to reconsider.
- Gossip == third party.
When exchanging peer information, we should exchange at least the certified addresses. When exchanging uncertified addresses, other nodes should treat them as gossiped.
This layer (peerstore) is mostly concerned with attaching those labels to addresses, and their subsequent queriability, and also with storing the certified record. Each subsystem is free to decide how and which addresses it exchanges; it can filter by querying the peerstore appropriately.
Applications may choose to exchange uncertified addresses over other protocols. When they do this, they can choose to record them as "application".
The problem with a generic "application" tag is that you may have multiple "applications" behind the same libp2p node. I believe a single value is short-sighted.
After sleeping on it, I think there's a larger opportunity to sculpt this solution into a more general "peer/address labels" mechanism.
-
Subsystems and apps can attach arbitrary labels to peers or addresses, and query the peerstore for them via
...peerstore.Options. -
That model would fit in well with a non-ref-counted #117, where subsystems attach labels to peers, and specify "policies" for labels, e.g. label
dht-routing-table=> "do not collect".
The registry is effectively a compression table, and it uses Golang type acrobatics to avoid runtime costs in string representation
My suggestion is that we just create it automatically on first use, storing the table to disk iff the peerstore is persisted. Otherwise, sub-components need to coordinate.
After sleeping on it, I think there's a larger opportunity to sculpt this solution into a more general "peer/address labels" mechanism.
What about multiple address store sources?
The problem with a generic "application" tag is that you may have multiple "applications" behind the same libp2p node. I believe a single value is short-sighted.
Fair enough. My concern is that if the tags don't imply ordering/preference, it'll be hard to choose which ones to dial first. I want to make sure we've thought this all the way through to the end goal.
My suggestion is that we just create it automatically on first use, storing the table to disk iff the peerstore is persisted. Otherwise, sub-components need to coordinate.
Without lifecycle methods (peerstore.Start()), we don't have a good way of binding/registering labels before the peerstore starts and reconciles its table with disk. We would need to inject them at peerstore construction time, which is the Host construction time, which means that we'd need to inject user-defined labels as a libp2p Host option -- that's too far up.
In other words, there's a few things to fix in libp2p before we can make this nicer.
I'd say, let's shoot for the API we want, caveating against known pitfalls. When we fix the init/construction logic of libp2p, there'll be only one thing to fix, instead of two.
What about multiple address store sources?
Not sure I follow. You mean addresses that have more than one source? You'd attach two/three/as many labels as you wish to them.
Fair enough. My concern is that if the tags don't imply ordering/preference, it'll be hard to choose which ones to dial first. I want to make sure we've thought this all the way through to the end goal.
At this point I don’t want to use this as an implicit/hardcoded precedence metric. With the modular dialer, users should easily be able to configure precedence. With dialer v1, anything we do to prioritise dials is gonna be spaghetti and adhoc. With the modular dialer, you’d be able to specify the order of dials when instantiating the pipeline.
@Stebalien I've generified this to address (and in the future, peer) labels.
@raulk This looks good to me so far; I might remove the comments about labels applying to either addresses or peers though, at least until we have an API to set and read labels for peers.
As it's worded now, it's not clear whether ps.AddAddrs(p, addrs, Labels(someLabel)) would add the labels to the peer or the addrs.
Without lifecycle methods (peerstore.Start()), we don't have a good way of binding/registering labels before the peerstore starts and reconciles its table with disk. We would need to inject them at peerstore construction time, which is the Host construction time, which means that we'd need to inject user-defined labels as a libp2p Host option -- that's too far up.
We intern them:
- If we have a persistent peerstore, we restore the label set from disk on load.
- When the user calls
AddAddrs(..., Label("mylabel")), we: - Check to see if the label is a known label.
- If not, add it to the label set, assign the next ID, and persist the label set.
- Use the label's ID.
Basically, lazily register labels and automatically assign codes as we need them.
Discussed out of band. @raulk proposed that, to solve the "who do I dial issue" without dialer v2, we can pass dialer constraints along with the context.
The only concerns I have here are:
- Annoyance. We can ensure we only use certified records in the DHT easily but ensuring this in other services will become annoying.
- Multiple parallel dials. We already have the issue where new addresses aren't added to existing dials. However, this means that parallel dials where one is more "tolerant" than the other may behave oddly.
- I'd like to be able to set a default policy (e.g.: user + certified, falling back on everything else if we don't have either).