nips
nips copied to clipboard
Paths and Queries in Relay URLs
A relay is defined by its websocket server, which means that the protocol, host and port of a URL matter in differentiating relays.
Path and query do not determine which websocket to connect to, they are submitted after connection.
Two different URLs with the same (protocol + host + port) point at the same relay, but may trigger different relay-specific behavior when addressed with a specific path and/or query.
I find this very difficult to deal with. Nostr spec refers to relay URLs without any reference to path and/or query. I've presumed different URLs are different relays, even if they only differ in path or query. But this has caused problems:
- Too many connections to the very same host/relay, giving "429 Too Many Requests"
- Fetching many times from the same host/relay over multiple websocket connections
- Posting the same event many times to the same host/relay
- In gossip it is very difficult for users to manage hundreds of variations of the same relay in the relay pages. I want to set rank=0, but there are far too many pages I have to do this on, and if a new npub path comes up it won't apply to those.
If however I don't preserve the path and query, then the relay doesn't get the benefit of having the path and query (whatever functionality it was intending to provide by having it).
Before making massive sweeping changes to gossip I'd like to hear what others think.
Interesting question. Even if there are valid use cases for paths/query strings, a client can't tell if these are minor configuration options, or consist of an entirely separate data set. Separate thought, attackers could just generate lost of variations of a single relay url and DOS the relay (and a user's client potentially).
My first thought would be to prohibit paths or query strings and say clients MUST normalize to just the domain name. This wouldn't preclude relays from parameterizing their url via subdomain, but failures would happen at the DNS level, rather than what currently happens with relays which accept connections at any path. This would of course break current implementations like nostr.wine.
This seems mostly like a relay problem. Relays should reject connections on anything other than a valid, meaningful path. If a client is connected to the same relay twice, the second connection should be rejected (unless the path indicates a different "virtual relay").
Path differentiation is crucial for any relay who wants to offer different services in the same process, like different filtering options, without having to bloat the protocol with unsupported flips and switches.
A simple example:
wss://relay.com/-> returns all noteswss://relay.com/pt-> returns only notes in Portuguesewss://relay.com/jp-> returns only notes in Japanese
Why not wss://jp.relay.com? I'm not saying this is the right solution, but you don't necessarily need paths.
What is happening in reality here? Is someone spamming the Gossip database with fake relay URLs? I don't think banning paths would help in the case of an evil spammer as they can just spam subdomains or just make up fake domains.
I think a reasonable solution would be to add a flag to NIP-11 like "paths_are_virtual_relays": true or something like that, so nostr.band and filter.nostr.wine could set that. And also the relay I'm working on.
Why not
wss://jp.relay.com? I'm not saying this is the right solution, but you don't necessarily need paths.
This would be the solution in case paths were blocked, but it can be harder to implement in some cases -- and then we would be back to the same problem of how to differentiate between a virtual relay and a normal relay.
What is happening in reality here?
My gossip install knows of hundreds of filter.nostr.wine relays. Gossip tries to connect to many of them, and loops forever trying to connect to lots of them and getting "429 Too Many Requests" trying again later. I also can't easily set them all to rank=0 to stop this chatter because each one is seen by gossip as a separate relay. So I'm trying to figure out what to do next. Maybe I should just ignore it and consider it "their problem"? Maybe I can come up with a new concept of a relay endpoint, and manage these endpoints as a single thing instead of hundreds of things based on path variations. I don't know, I wanted feedback.
What I did in some codebases was to hardcode filter.nostr.wine and relay.nostr.band as having these special virtual paths that should be ignored by outbox relay selection, but I think we should standardize a field on NIP11 to help with this.
I'm in favor of a NIP-11 field.
Yea can we have a canonical URL in the relay document?
What would be a canonical URL?
I guess it would be whatever is the root URL which serves the same dataset.
So take 3 relays, the first is / the second is /fr and the third is /jp, /fr and /jp service only content in those regions by language for example but they work more like a preset filter on all requests, although if you write events to any of the 3 they are the same database and so all are available on /.
In this case for outbox, just because somebody wants to read posts in Japanese and French, doesnt mean that they need to hit /jp or /fr to get those posts, they can just connect to the / and get everything.
Maybe in the case of filter.nostr.wine the canonical url is actually wss://nostr.wine
This requires me to have prior knowledge about what a path means. I'm trying to write general code that doesn't have special cases in it.
So I'm going to use (protocol, host, port) as a RelayServer type, and some of the data we store per RelayUrl will move to the RelayServer struct. That will solve the relay management problem (just one config page for all of filter.nostr.wine).
But for outbox relay specifications I will still use RelayURL. That means I am still stuck with the /jp (for example) or someone's npub (for example) if somebody puts such relay URLs in their relay list. I could stop the gossip user from doing that to their own relay list by detecting any kind of path or query in the URL and warning them. But I will still find these all over the place in other people's relay lists, and for that I guess I have to fall back to special case code?
And if so, that indicates to me that the nostr protocol is lacking something... nobody should need to write special case code to make nostr work properly. But I'm not convinced what that something is.
I have this problem with paths and query strings too with nostr.band indexer. There is essentially no way to know what exactly the relationships are btw different paths, as those can and are and will be even more 'weird' as fiatjaf says. I doubt we can come up with proper announcement syntax for all that weirdness.
Also, if someone wanted to spam the network with shitty relays they would not be announcing the 'canonical' url.
Also, most (all?) of current relays with legit uses of paths and queries are same dataset and same server with shared rate limits, with minor weird differences btw paths, so treating them all as one relay by default is the correct strategy.
I think we should say in the nips that by default same origin means the same relay.
And then if some relay admin actually thinks clients should treat their paths as separate relays (and thus is ready to handle the resulting overload, too many connections, potential duplicates requested and served, etc) then they should announce that in the nip11 file (i.e. "custom_relay":true for each path). And then clients can choose to abuse those relays with many connections but should watch them and ban the origin's paths if they return too many 429 or return too many duplicate events.
Does this make sense?
Maybe just assuming every path is a virtual relay is indeed more sensible than adding stuff to NIP-11.
So if you don't care about the virtual stuff you just remove all paths.
We've been trying to drop the npub path from filter.nostr.wine for over a year now but unfortunately major clients still refuse to implement NIP-42 (and most existing implementations still do not work correctly). Our paths are terrible for a lot of reasons and were never meant to be a permanent solution.
Stricter rate limits were added more recently because of rogue automated gossip-model clients with poor reconnection logic.
Part of this problem is users don't know where to put relays like filter (proxy relays that aren't for public use). They need to go in an app-relay list that is NOT apart of their NIP-65. There is no value in connecting to filter.nostr.wine if you are not a paying customer.
We'll adopt whatever is decided.
Having messed with my code for more than a week trying different ways of supporting URLs for some things but making it manageable for the user, I'm still left pulling my hair out.
If we made the following restrictions, my job would be so much easier:
- Relays must service their root path URL (with no query part required either)
- kind 10002 relays and relay hints must be specified at the root path (or clients can ignore the path/query parts and they must work)
I don't care if people use relay paths for specialized things, and we need that I'm sure. But due to the fact that paths have no defined meaning in the NIPs, and the nips define relays as URLs, different paths currently define different relays, and that is kind-of a mess. If I could only assume the root path was valid and good enough, I could put this week of (feels like wasted) work to bed and move on.
The options I could do are:
- Keep things as they are, have hundreds of filter.nostr.wine relays defined and unmanageable
- Keep things as they are, but special-case ignore the paths on filter.nostr.wine (and maybe a few others)
- Force all relay urls back to their root path and assume all relays can always handle that (my preference right now)
- Keep the paths when connecting to a relay (in case it matters) but collapse management of relays per-origin instead of per url (I have a branch for this, but it has it's own subtle problems still)
I am using /relay in my software because the homepage is... a homepage
- https://gleasonator.dev/
- wss://gleasonator.dev/relay
relays and relay hints must be specified at the root path
I feel like this would only be a short term fix, as relay providers could still differentiate by subdomain.
It seems to me the root here is simply poor relay selections. Either relays like blastr with never return any results, or nostr.wine relays which return all the same events. In either case, the real solution would probably be analysis of what events for a given filter are returned from a given relay. If two relays always return the same events, the client could notify the user, "hey, you're probably not getting much out of having both of these relays". This kind of thing could be useful for more general analysis, like "these relays overlap by 93%" which users could use to optimize their relay selections (or have clients do it for them).
It seems to me the root here is simply poor relay selections.
I suppose though that the user doesn't have much control over other peoples' selections, or relay hints.
I am using
/relayin my software because the homepage is... a homepage* https://gleasonator.dev/ * wss://gleasonator.dev/relay
That was the case I was worried most about. Good to have it confirmed I guess.
Ok I'm going to throw away a weeks work, follow the current NIP standard, and work on something else. I need to get past the sunk-cost fallacy and just drop it I guess.
I am using /relay in my software because the homepage is... a homepage
There are other ways to do this, a relay can be served from the same URL as a webpage.
For example: https://lunchbox.sandwich.farm, https://relay.nostr.watch, https://user.kindpag.es
Doesn't change anything ofc, relays will be served from where-ever people want to serve them. Definitely makes things difficult though.