nips icon indicating copy to clipboard operation
nips copied to clipboard

Querying events by tags presence

Open fernandolguevara opened this issue 2 years ago • 35 comments

An attempt to stop using different zoom levels on location specific use cases

View it

fernandolguevara avatar Jul 26 '23 12:07 fernandolguevara

This is #523

It's a presence filter specifically.

alexgleason avatar Jul 26 '23 12:07 alexgleason

I think this would solve many problems and we should have it. But there are challenges in relays actually implementing it: https://github.com/nostr-protocol/nips/issues/523#issuecomment-1650244105

We would have to define which specific tags get the index. We can't do all of them without a full table scan.

alexgleason avatar Jul 26 '23 13:07 alexgleason

[...] there are challenges in relays actually implementing it [...]

@alexgleason Relays already have to deal with indexing array of values like array of e tag values. An array of tag names that are present in the event would be similar, so no problem. If the relay DB isn't able to do "not contains" type of query, it can create an array of absent tag names (A-Za-z minus the present ones).

arthurfranca avatar Jul 26 '23 17:07 arthurfranca

You would have to basically double the amount of space used by tag indexes. Maybe more since you can't use a partial index if you want both presence and absence filters. It's probably worth doing it at least for "e" tags since we have a strong use-case in #523 that is a very common one. Other tags would need strong arguments in favor of doing it, I think.

Still, I think that shouldn't necessarily block this NIP from proceeding. We can nail down the API.

alexgleason avatar Jul 26 '23 17:07 alexgleason

If this NIP-100 gets merged but implementing it isn't a requirement, in practice it may be like if it never existed because no client would be able to rely on it cause not all relays would implement it. Probably when fetching the feed, clients would continue requesting all notes be them root or not, because potentially many relays are involved.

@fiatjaf any chance you mark this NIP, if merged, and others like NIP-12 and NIP-20, as "Required" on README.md NIP list? Meaning it would be as if they were inside NIP-01, like minimum NIPs that must be supported to consider the relay/client nostr-compatible. Also, as an exception, put them near the top just below NIP-01 despite the numbers.

arthurfranca avatar Jul 26 '23 17:07 arthurfranca

Me stating a thing is required doesn't cause everybody to immediately implement it.

Also, as @alexgleason said in the other issue, it's very costly for relays to implement this. I think we absolutely do not need it.

fiatjaf avatar Jul 27 '23 11:07 fiatjaf

I think that if your client depends on this you're implementing something wrong. If you're following a person and want to read what they write you want all their kind1s. Regardless of whether you'll display everything in the UI or in different views according to the tags, you still should download everything, store locally and display when appropriate.

fiatjaf avatar Jul 27 '23 11:07 fiatjaf

Also, as @alexgleason said in the other issue, it's very costly for relays to implement this

Disagree or else I wouldn't be supporting this but don't know what DB you guys are considering.

[...] you still should download everything, store locally and display when appropriate.

Although some clients are doing this I don't consider it efficient. But if doing everything client-side is the recommended way I think its ok to close this PR and the other issue.

arthurfranca avatar Jul 27 '23 15:07 arthurfranca

I like this.

I think that if your client depends on this you're implementing something wrong.

Not exactly. There are use cases where this is a real need. If you want Global without replies, for instance, it doesn't make sense to download everything and then filter replies out.

If you are doing a map of Nostr posts with a GeoHash, it doesn't make sense to download everything and then discard everything that doesn't include a g tag (which is almost everything right now - huge waste).

vitorpamplona avatar Jul 27 '23 16:07 vitorpamplona

If you want Global without replies, for instance, it doesn't make sense to download everything and then filter replies out.

I can understand this, but does anyone really want this? Sounds like some skewed preferences here. "Global without replies". Global is not a thing, and replies are not different from normal notes, technically. Should they have a different kind?

If you are doing a map of Nostr posts with a GeoHash, it doesn't make sense to download everything and then discard everything that doesn't include a g tag (which is almost everything right now - huge waste).

This I don't think is a valid use case (I mean, whatever, it is valid, but what I'm saying is that it doesn't fit Nostr, not all things fit Nostr if we want Nostr to remain simple). Either you are already fetching posts from people that you want, storing these locally somehow, and then you are displaying those that have g tags in a map -- or you should be using the map for a more restrict set of events, that aren't kind 1. Like ads for a local marketplace or whatnot, in this case you expect them all to have g tags.


Maybe we should be making more kinds for different types of events and relying less on tags for indexing. Since tags are so flexible it's easy to think they should be used for everything, but if we start doing that and relying on that this will not end well.

fiatjaf avatar Jul 27 '23 21:07 fiatjaf

I don't consider it efficient

what is not efficient? To store events that you want locally? You think it's more efficient to load them from relays over and over multiple times every day?

fiatjaf avatar Jul 27 '23 21:07 fiatjaf

Can we come back to this, please? Has any relay tried to implement it?

@mikedilger since you have just coded a relay, what do you think about this filter?

vitorpamplona avatar Feb 29 '24 14:02 vitorpamplona

what is not efficient? To store events that you want locally? You think it's more efficient to load them from relays over and over multiple times every day?

"Feed" events, for example. This event set gets stale so often that when the user re-opens the app they aren't interested anymore on the previously received events. That's why i believe these events should live in memory instead of in a persistent local db.


My unreleased client's "feed" is made of root events (no e tags) and top level replies (with one e tag) so this PR wouldn't help unless i could filter by number of e tags.

edit removed ugly syntax examples. Would be great to have it but i know it won't happen =]~

arthurfranca avatar Feb 29 '24 15:02 arthurfranca

It seems like the solution to everything not in NIP-01 is DVMs.

alexgleason avatar Feb 29 '24 17:02 alexgleason

I regret posting that.

alexgleason avatar Feb 29 '24 17:02 alexgleason

It seems like the solution to everything not in NIP-01 is DVMs.

Maybe we should indeed align our expectations to what the core protocol should solve for and what is expected of a Layer 2 design (DVMs) to do it. If we want to keep the relay dev simple, we should "outsource" everything to layer 2.

Or maybe we just create a new network of relays working on the same events but with more interesting filtering options. Clients can then choose which network they want/need to integrate with.

vitorpamplona avatar Feb 29 '24 17:02 vitorpamplona

I don't think filters are getting any more changes at this point.

I need more than just presence/absence tags, anyway. I need joins.

alexgleason avatar Feb 29 '24 17:02 alexgleason

@Semisol had some interest in building a new type of relay with a new filtering language. I am not sure if he ended up doing anything. But we could just do a relay with regular read-only SQL as an entry point.

vitorpamplona avatar Feb 29 '24 18:02 vitorpamplona

Or maybe this idea of filters and subscriptions themselves should be turned into replaceable events. I can imagine a client signing an event with a Nostr filter (or an SQL query) instead of using the REQ call. The relay would simply attach whatever comes in that event as a subscription to that connection. The d tag becomes the subscription id, then.

vitorpamplona avatar Feb 29 '24 18:02 vitorpamplona

@alexgleason

I don't think filters are getting any more changes at this point.

I need more than just presence/absence tags, anyway. I need joins.

can you elaborate on joins? what's you use case?

fernandolguevara avatar Mar 01 '24 14:03 fernandolguevara

Ok I just went back and read #523 and this issue again, and I have a few more things to say:

First, I have not encountered the need to do presence or absence (or tag count!) queries. But I don't think it is unreasonable.

Second, this idea that relays will need to do complex indexing is wrong. Relays should not index for these kinds of queries at all. Neither should they do hard scans of every event. Relays should (1) require such filters also contain other fields that already narrow down the event set to something reasonable, or else reject the filter as a scraper, and (2) load all the events ignoring the new presence/absence filter specifications, and (3) post-filter all matching events with these new fields. Sure, you loaded more events than you needed and then stripped them back... but that is far less resource consuming than sending them over the network and having the client strip them back. Basically it just pushes that filter operation to the relay to save on network bandwidth.

That being said, if a crafty relay developer wants to index these to boast about hyper-fast performance, that's fine, but we don't need to design for that case. And seriously, if someone sends a "give me all events that don't have a geo tag" were you really going to send them 99.9% of all the events in your database? I don't think so.

I don't like modifying the "#e" to be a non-array (e.g. having a 'null' option). I prefer this PR's method of adding a new field.

Clients SHOULD check NIP-11 before using the new field. But also the rule for relays ought to be "if you see a filter field you do not recognize, that is an error". I don't know if that was codified elsewhere but I think it should be.

I don't follow the need to count for the number of "e" tags, especially if we are moving to "q" tags.

I think this PR is pretty close as is. I'll add it to my relay if there is the momentum to do it (not too easy for me as I have meticulous memory layouts and detailed parsing to update).

EDIT: I don't think this NIP becomes required or part of the core of nostr. It will be okay if most relays don't implement it. Client will have to deal with errors from relays filters that don't accept the new field. BUT we probably do have to push through a small required change which is to make those errors machine-readable (new prefix) and specify that relays must reject filters with fields they do not recognize (I didn't check the current NIPs maybe that is already there).

mikedilger avatar Mar 03 '24 21:03 mikedilger

it just pushes that filter operation to the relay to save on network bandwidth

Interesting take. A caveat is it may mess with "limit" filter, like if a client asks for limit:1, relay fetches 1 record from DB, then filter that 1 out after running the presence/absence/tagcount in memory and returns 0 records, when there could have a matching item on DB. Could be a good trade-off.

the rule for relays ought to be "if you see a filter field you do not recognize, that is an error"

That's the part i disagree. If incompatible relays simply ignore the unknown filter field and apply just the ones it understands, client can still apply the extra filter client-side. Client would still have the option to use the strategy of checking NIP-11 to skip incompatible relays if it prefers not to re-filter client-side.

arthurfranca avatar Mar 04 '24 14:03 arthurfranca

it just pushes that filter operation to the relay to save on network bandwidth

Interesting take. A caveat is it may mess with "limit" filter, like if a client asks for limit:1, relay fetches 1 record from DB, then filter that 1 out after running the presence/absence/tagcount in memory and returns 0 records, when there could have a matching item on DB. Could be a good trade-off.

Oh right.

the rule for relays ought to be "if you see a filter field you do not recognize, that is an error"

That's the part i disagree. If incompatible relays simply ignore the unknown filter field and apply just the ones it understands, client can still apply the extra filter client-side. Client would still have the option to use the strategy of checking NIP-11 to skip incompatible relays if it prefers not to re-filter client-side.

The problem I'm worried about is if a client specifies a new filter field the relay doesn't understand in order to prune the search to something reasonable, but the relay skips that new filter and dumps massive events on the client.

mikedilger avatar Mar 04 '24 21:03 mikedilger

Instead of adding new filter properties, we can include an extension to NIP-50's search property, like:

{ "search": "has:#e" }

This fixes everything IMO.

alexgleason avatar Mar 06 '24 20:03 alexgleason

Instead of adding new filter properties, we can include an extension to NIP-50's search property

Special syntax for searches is super annoying because it mixes data with code. What if someone wants to search for a note that includes "has:#e"? Why not just add a new filter property, e.g. has: ["e", "a"]

staab avatar Mar 06 '24 20:03 staab

It's already part of NIP-50 https://github.com/nostr-protocol/nips/blob/master/50.md#extensions

To search for a note with "has:#e" in the text, you'd do this:

{ "search": "\"has:#e\"" }

The functionality in question (filter by tag presence/absence) IS a search functionality. It makes most sense for search relays to implement it.

Also the key:value syntax is common among search engines, and is even used by Postgres and SQLite FTS. You have to quote strings for them to not be treated as search tokens.

alexgleason avatar Mar 06 '24 20:03 alexgleason

Why not just add a new filter property, e.g. has: ["e", "a"]

Because after a lot of discussion and many months, I realized it's not going to happen. And it probably shouldn't happen.

alexgleason avatar Mar 06 '24 20:03 alexgleason

It's already part of NIP-50

I did not realize that. Lame.

Also the key:value syntax is common among search engines

Yes, and I've spent way too much time dealing with user inputs that include special characters. As far as I'm aware, with postgres at least you have to do the escaping in your application code, which is painful and error-prone. There's no reason we need to make the same mistakes as the past.

staab avatar Mar 06 '24 20:03 staab

Check also #1105 to see another application of NIP-50 extensions. It makes sense to do advanced filtering there.

alexgleason avatar Mar 06 '24 20:03 alexgleason

I'm not saying these aren't useful, but cramming them in a plain text field is a mistake. Instead of new keys you could add an extensions key with the same syntax. But since it's already in NIP 50 it's probably a done deal and there's no point arguing.

staab avatar Mar 06 '24 21:03 staab