nips icon indicating copy to clipboard operation
nips copied to clipboard

Content Discovery - Scalability

Open GeorgeTsagk opened this issue 2 years ago • 9 comments
trafficstars

Currently there is no structured way on how to seek a specific event by its id.

AFAIU, a client is connected to many relays and if it seeks for an event with a specific id it will have to query all of the connected relays in order to possibly acquire the desired event.

  • What happens if the desired event is never returned?

  • If the client proceeds with seeking new relays and replay the same queries to them is there any guarantee that the desired event will eventually be returned?

  • How does the client know when to give up? (i.e the id doesn't exist or was deleted)

If there are no clear answers to the above questions would it be fair to say that Nostr won't scale assuming many relays and users come onboard in the near future? As relays don't have clear incentives on why they should store & relay other peoples' events won't they eventually shut down or start exploring ways of charging for their services?

The current approach that Nostr follows is "ask everybody and eventually you'll find it", which is simple in design but troublesome in practice as it can either lead to overwhelming network traffic and/or very slow query responses (depending on the way you optimize it).

Does nostr need a NIP for some smart relay-to-relay query propagation? That would introduce a baseline for network structure.

GeorgeTsagk avatar Dec 27 '22 16:12 GeorgeTsagk

this is completely wrong, you can perfectly query the right event (or events) by id. ex: ['REQ', 'just one', {ids:[event_id]}]

eskema avatar Dec 27 '22 19:12 eskema

Events can be either seen inside the network or they can come from a link. If they're coming from a link they can come with relay information attached. See nevent on NIP19. If they're being referenced inside the network they will either have an author, so a client can use the relays it knows are used by the specific author to query for it; or they can be referenced by some other event, in this case the reference should also come with an indication of the relay in which that event was seen. Aside from that a client can always use some general-purpose "fallback" relays to query. If none of that works just stop searching. There are no guarantees.

fiatjaf avatar Dec 27 '22 20:12 fiatjaf

this is completely wrong, you can perfectly query the right event (or events) by id.

I am not talking about the client being able to form the query, I am talking about how the event is searched for through the network.

e.g. What happens if none of the relays I'm connected to return any result to the query ['REQ', 'just one', {ids:[event_id]}] ?

In case my initial description wasn't clear I'm not referring to the form of the query, but what happens on the relay level

If they're being referenced inside the network they will either have an author, so a client can use the relays it knows are used by the specific author to query for it;

In this case how does a client know which is the up-to-date list of relays used by the author?

in this case the reference should also come with an indication of the relay in which that event was seen

Right, but that would only serve as a hint, nobody guarantees that the referenced relay still operates and/or isn't censoring.

If none of that works just stop searching. There are no guarantees.

So what exactly happens if the original set of relays used by the author is simultaneously censored/blocked?

GeorgeTsagk avatar Dec 27 '22 20:12 GeorgeTsagk

In this case how does a client know which is the up-to-date list of relays used by the author?

It either infers that from hints and events he sees on the network or gets that from the author himself when he starts to follow then or by querying their relay recommendations on other relays or by using NIP-19 or NIP-35, there is no guarantee of anything, clients must try their best. Nostr doesn't want to replace all the other means of communication that have existed since the birth of humanity, these other means can still be used to communicate information.

If someone just vanishes without any hint and starts publishing on a new relay no one knows and doesn't bother to signal that in any way that is them exercising their choice to not be reachable.

Right, but that would only serve as a hint, nobody guarantees that the referenced relay still operates and/or isn't censoring.

Yes, no one guarantees. There are no guarantees. For guarantees you need a blockchain (still not 100% guaranteed though).

So what exactly happens if the original set of relays used by the author is simultaneously censored/blocked?

How exactly? All at the same time exactly? I think this would be a rare event targeting a specific person who probably knows they are at risk of being censored, so this person could have taken some measures beforehand. If they don't, they can start publishing on new relays and try to spread hints through other relays and through people that are in closer contact about in which relays they are now, then clients must be smart enough to take these things from other events and start connecting to these new relays. Not ideal, I know, but much better than what we have today.

Other tricks can be added to improve the censorship-resistance of the protocol, like someone has suggested to do some Hypercore thing to allow relay information about a key to be propagated using a more solid P2P protocol. These things can be added in a backwards-compatible way, optionally, only by some clients. Ultimately they can also be censored and there are no guarantees of anything.

fiatjaf avatar Dec 27 '22 20:12 fiatjaf

Nostr doesn't want to replace all the other means of communication that have existed since the birth of humanity, these other means can still be used to communicate information.

I think the point is, if Nostr must rely on other existing communication channels to circumvent censorship, then what exactly is Notr's value proposition? What does it add to those existing channels?

I'm not saying there isn't a point to using Nostr, but I can't find a clear explanation of the value proposition and how it's achieved. This might be just a matter of documentation.

lucash-dev avatar Dec 28 '22 20:12 lucash-dev

if Nostr must rely on other existing communication channels to circumvent censorship, then what exactly is Notr's value proposition? What does it add to those existing channels?

@lucash-dev My impression is that it adds a non-custodial publication standard. I would contrast this with the custodial drawbacks outlined in nostr/README.md section "The problem with Mastodon ..."

Regarding client isolation censorship, I believe the situation is analogous to receiving/broadcasting bitcoin blocks/transactions respectively? We can assume an individual authors publications will make it through some channel without specifying it.

Regarding publication continuity, an event can at least reference other event publications via the tags property, somewhat like an optional form of SSB chaining.

Currently there is no structured way on how to seek a specific event by its id.

@GeorgeTsagk I think some analogous distributed hash table strategy might be found to address this? Events seem similar to files on IPFS, for example.

nullpraxia avatar Dec 30 '22 04:12 nullpraxia

Regarding client isolation censorship, I believe the situation is analogous to receiving/broadcasting bitcoin blocks/transactions respectively? We can assume an individual authors publications will make it through some channel without specifying it.

It's quite different.

In Bitcoin every node in the p2p network should receive a copy of blocks and transactions (it's a broadcast network), so you just need to connect to any honest node to fetch them. Specifically, if you have a path of honest nodes between sender and receiver the communication happens. This is a pretty weak assumption. Also, the protocol explicitly tells nodes to broadcast transactions and blocks, and how to do it. There's no "somehow blocks reach nodes through unspecified channels".

In Nostr, for communication between Alice and Bob to happen, both need to be connected to the same honest relay. If at any point both aren't connected to the same relay, the receiver Bob needs to somehow figure out what node to connect. Unless Bob knows a vast number of relays, and Alice not only knows some relay in common with Bob, but broadcasts to all relays they know (which might become expensive once usage scales), there's no mechanism within the protocol to reconnect both -- or even to detect the connection was lost!. This a much stronger assumption than the one needed for Bitcoin.

Please note Alice doesn't know Bob wants their messages, doesn't know which relays Bob can connect to, doesn't know how to find Bob by other means, and might not know that her messages are being censored. Bob might also have no idea Alice is being censored, as there isn't any censorship detection mechanism in the protocol.

So only broadcasting to all relays work -- and this isn't even part of the protocol.

Nostr will likely need some extra-protocol means for Alice to either store messages in something that's equivalent to a relay outside of the protocol (like an email newsletter), in which case Nostr sort of becomes redundant -- or will have to rely on some other sort of censorship-resistant broadcast network -- like Bitcoin. If the number of such "where to find me" messages isn't very large you could in fact use Bitcoin transactions as a means of communication (or other blockchain, though most aren't as censorship-resistant as Bitcoin) -- but to be usable that would have to be written in the protocol.

It looks like Nostr claims to be censorship-resistant, but that only works if there is already a secondary censorship-resistant channel for broadcasting your relay to everyone interested! Worse, people (or software) are just supposed to know how to use those other channels to make Nostr work. I don't think that makes a lot of sense.

I think fixing this needs at very least for those external channels to be enumerated, and a standard way of using them to be specified in the protocol.

If that isn't a problem for Nostr, and is supposed to be left unspecified, then I have trouble understanding exactly what value is Nostr adding. Moreover when it's claimed to be a censorship-resistant alternative to Twitter.

lucash-dev avatar Dec 30 '22 06:12 lucash-dev

@GeorgeTsagk I think some analogous distributed hash table strategy might be found to address this? Events seem similar to files on IPFS, for example.

Yes it could use techniques like DHTs. It's quite convenient as events are identified by a hash already. Although events in Nostr are much more simple structures than the ones of IPLD (data model of IPFS).

It looks like Nostr claims to be censorship-resistant, but that only works if there is already a secondary censorship-resistant channel for broadcasting your relay to everyone interested!

I truly believe there's a lot of potential in integrating LN here. LN not only solves incentivization issues for relays & spam protection, but also offers a reliable, private & censorship resistant communication medium (delivering Nostr messages related to relays over LN payments). I believe the NIP-13 issue is a better place for this discussion.

GeorgeTsagk avatar Dec 30 '22 11:12 GeorgeTsagk

I think @lucash-dev brings up an important point. In a recent interview @fiatjaf said that currently clients suck because they naively broadcast and request events from all connected relays. If I understood him right, he would like to see clients be more precise about which relays they connect with in order to retrieve events.

This means that a client shouldn't just have a static list of relays it queries everything from. Instead, the static list should serve primarily as 1. repositories for the user's events, aka "home relays", and 2. a starting point for bootstrapping the user's social graph (basically retrieving their latest kind 2 and looking at which users are associated with which relays).

I'm planning to start working on adding this more nuanced relay traversal to Coracle, but I'm a little confused about what the standard way is to find an appropriate relay:

  • NIP 01 specifies that kind 2 works to recommend a relay, but it seems unwieldy to un-recommend a relay using kind 5, since you're relying on what we know to be a very unreliable method of deletion. A single replaceable event like kind 3 would seem to work better.
  • Speaking of which, kind 3 seems to be the de-facto standard (at least, that's how astral gets recommended relays), but it relies on a user/client selecting a single, canonical relay on another user's behalf, rather than allowing the user to choose his "home relays" for himself.
  • NIP 19 specifies nevent and nprofile, which is very helpful, but they're only for sharing out of band, not within the protocol.
  • NIP 35 specifies that .well-known/nostr.json should have a relays key associating pubkeys with a list of relays. Is this populated from kind 2? Also, why would a relay want to advertise other relays for a user they already hold data for? Finally, I don't see anywhere the nostr.json file is implemented.

All that said, I think NIP 23 (sans read filters) solves this really elegantly. With that implemented, kind 3 could be used not to determine the home relay, but to find a relay which is likely to host the user's canonical list of home relays. Even that seems a little sketchy though, since that one relay may have become obsolete for one reason or another, and now the target pubkey cannot be found. What might be better would be to normalize copying all recommended relays from kind 10897 (or kind 2) into petnames, resulting in duplicate #p tags.

staab avatar Jan 04 '23 05:01 staab

This is an area of ongoing growth and research for nostr, but NIP 65 https://github.com/nostr-protocol/nips/blob/master/65.md has addressed many of the problems articulated here. Nostr clients will need to accept network partitioning (the naive default), use second-layer centralized services to index the network (the emerging standard), or use gossip-style recommendation schemes to traverse the network (partially solved by NIP 65's recommendation to publish kind 10002's more broadly).

staab avatar Mar 23 '23 14:03 staab