nips
nips copied to clipboard
Nostr apps bring data subscription fast to there monthly limit
Has anyone thought about smart filters or is there a NIP? I.e. clients can ask relays i.e. using bloomfilters to not double fetch. i saw @mmalmi suggest that, to reduce extensive data-flow?
I'd suggest a filter like { not: { ids: bloomFilter }, ... }
where the bloom filter contains the event ids you already have.
Will use in Iris if someone has time to implement it on the relay side.
Another filter suggestion: { fields: ['id'], ... }
so you get only the ids of the events that match your query.
Then you can 1) ask for only the events that you don't already have and 2) ask only one relay at a time. That's useful when you're connected to 10 relays and they might all send you the same events. You could save up to 90% bandwidth.
Bitcoin does something similar by advertising new transaction and block hashes with inv
messages and only sending the full item in response to a getdata
request.
Bloom filters can have false positives, so they might contain IDs that they are not intended to contain.
Bloom filters can have false positives, so they might contain IDs that they are not intended to contain.
Yes, but you can set the false positive rate low enough that it doesn't matter in "send me a history of everything" type queries. Whereas in DMs for example you might not want to miss even one in a million. And for future message subscriptions it would be counterproductive.
yeah, no best compromises, i test Cukoo filter too right now, but bloomfilter are more common to understand, so the preferred way; will post something simple in py, quick and dirty.
With so much data on relays in the future the filtersize might be a problem? Outbound data count too.
I also thought about a set of keys that pre sort the relay database, it's highly unlikely to find fast collisions, when using just 8 random or fixed bits from an id hash as additional key, same i guess as using a fast trap function with shorter bit length than sha256 for the bloom bits. That might reduce the filter length, and processing time. But such tweaks are probably not suitable for a wider accepted general sketch, howto do filters like that.
{ not: { ids: bloomFilter }, ... }
i guess i will have a simple relay that can understand that tag just for testing soon ready. Just for fun and learn some nostr and iris,
You could save up to 90% bandwidth
I.g. i guess also the band-wide can be highly reduced, if the meta, like profile pic links and pictures are prioritized to filter out doubles. and thereby preventing double external net fetches on the client side.
Some measures could also be done on clientside? To prevent spam inbound of the open websocket, after the initial filter post, when successive posts are expected, one can assume that there will be spammy relays. Just brainstorming here :shrug: , no idea how much of that is already done.
Since we do not want to miss a single id, i might have found a fast and working sketch. Classic blooms or CoKoo we might not need, since the id is already a good hash with enough entrophy and by changing the primes on the fly we can build fast new filters to mitigate against initial short_id collisions. so after some short time ids that maybe missing from the lead relay, find on the sub relays. on new REQ calls In a real scenario those random primes created on the fly could be variable in length and relay to save more, or longer to mitigate against collisions. So no relay get's the same filter. in essence this filter build just short_ids from the 32Byte ID hash to save space, since we could save only true bandwide, if the outbound data is small enough times n relays calls. With 25 bits in the sample patch, we probably can call up to 40 relays before it wont make any difference, and with just 5 or ten we shoud save roundabout 90%
So if the first relay is the main one we call for bulk messages, we see the others only as backups, that need only to give us events we do not already have. The sketch works easy the other way around when relays would signal themself new prime filter.
I guess if this is tested a bit, a new NIP-XX could be created to advertize such filters to relay, as option for ppl who care about bandwide usage.
here a diff to the source of the server/client nostrpy that has to be patched for testing fast short prime filter https://github.com/monty888/nostrpy
it will expose when patched a relay understanding in the REQ connect
{ not: [{'filter': ['<IDS_mod_prime_in_hex>' , ..], 'mod' : <random_primenumber_in_hex> }]
on localhost:8081
the filter could also be later a bitfield, but for now a lsit
git clone https://github.com/monty888/nostrpy
cd nostrpy
python3 -m install -r requirements.txt
copy prime filter patch see click me into new file patch.txt
patch ./patch.txt
and then run
python3 relay_mirror.py
to get some data
and then
python3 run_relay.py
to local serve them
and in a second console
python3 cmd_event_view.py
to see the filter building up by the next start of the relay it will only send event not matching that filter.
Click here :+1: to see the patch
diff --git a/cmd_event_view.py b/cmd_event_view.py
index 9eec93a..c620dc8 100644
--- a/cmd_event_view.py
+++ b/cmd_event_view.py
@@ -23,6 +23,8 @@ from nostr.event.event import Event
from nostr.encrypt import Keys
from app.post import PostApp
from cmd_line.util import FormattedEventPrinter
+import random
+from sympy import isprime
# TODO: also postgres
WORK_DIR = '/home/%s/.nostrpy/' % Path.home().name
@@ -31,9 +33,9 @@ DB_FILE = '%s/tmp.db' % WORK_DIR
# RELAYS = ['wss://rsslay.fiatjaf.com','wss://nostr-pub.wellorder.net']
# RELAYS = ['wss://rsslay.fiatjaf.com']
# RELAYS = ['wss://relay.damus.io']
-RELAYS = ['wss://relay.damus.io','ws://localhost:8081']
+# RELAYS = ['wss://relay.damus.io','ws://localhost:8081']
# RELAYS = ['wss://nostr-pub.wellorder.net']
-# RELAYS = ['ws://localhost:8081']
+RELAYS = ['ws://localhost:8081']
# AS_PROFILE = None
# VIEW_PROFILE = None
# INBOX = None
@@ -55,6 +57,9 @@ usage:
sys.exit(2)
+FILTER = [{'filter': [], 'mod' : 0x1fffffff }]
+
+
class ConfigException(Exception):
pass
@@ -307,6 +312,8 @@ def run_watch(config):
'since': util_funcs.date_as_ticks(since),
'kinds': [Event.KIND_TEXT_NOTE, Event.KIND_ENCRYPT]
}
+ e_filter['not'] = FILTER
+ # [{'filter': ['290253260',' 507307490','357358348'], 'mod' : 0x1fffffff }]
if until:
e_filter['until'] = until
# note in the case of wss://rsslay.fiatjaf.com it looks like author is required to receive anything
@@ -332,6 +339,16 @@ def run_watch(config):
share_keys=share_keys)
def my_display(sub_id, evt: Event, relay):
+ c_evt = evt
+ mod_v = 0x1fffffff
+ for mod in FILTER:
+ mod_v = mod['mod']
+ h = int(c_evt.id,16) % mod_v
+ print ('data testdebug %s %d' % (c_evt.id, h))
+
+ for filter in FILTER:
+ filter['filter'].append(str(hex(h)))
+
my_print.print_event(evt)
my_printer.display_func = my_display
@@ -383,6 +400,15 @@ if __name__ == "__main__":
# logging.getLogger().setLevel(logging.DEBUG)
util_funcs.create_work_dir(WORK_DIR)
util_funcs.create_sqlite_store(DB_FILE)
+ #gen a filter prime of n bits
+ n = 25
+ prime = 4 #:-)
+ while not isprime(prime):
+ prime = random.getrandbits(n)
+ prime |= (1 << n)
+
+ for mod in FILTER:
+ mod['mod'] = prime
run_event_view()
# client = Client('ws://localhost:8081').start()
# client.query([{'kinds': [0], 'authors': []}])
diff --git a/nostr/event/event.py b/nostr/event/event.py
index 6ddb143..a063f52 100644
--- a/nostr/event/event.py
+++ b/nostr/event/event.py
@@ -211,6 +211,7 @@ class Event:
return ret
+
def __init__(self, id=None, sig=None, kind=None, content=None, tags=None, pub_key=None, created_at=None):
self._id = id
self._sig = sig
diff --git a/nostr/relay/relay.py b/nostr/relay/relay.py
index 5030857..83b96a5 100644
--- a/nostr/relay/relay.py
+++ b/nostr/relay/relay.py
@@ -52,6 +52,7 @@ class Relay:
"""
VALID_CMDS = ['EVENT', 'REQ', 'CLOSE']
+ not_ar = None
def __init__(self, store: RelayEventStoreInterface,
accept_req_handler=None,
@@ -60,7 +61,8 @@ class Relay:
description: str = None,
pubkey: str = None,
contact: str = None,
- enable_nip15=False):
+ enable_nip15=False,
+ not_ar = None):
self._app = Bottle()
# self._web_sockets = {}
@@ -313,8 +315,13 @@ class Relay:
'id': sub_id,
'filter': filter
}
-
+
logging.info('Relay::_do_sub subscription added %s (%s)' % (sub_id, filter))
+ for c_myarr in filter:
+ if 'not' in c_myarr:
+ self.not_ar = c_myarr['not']
+ logging.info('Relay::_do_sub not primef %s' % (self.not_ar))
+
# post back the pre existing
evts = self._store.get_filter(filter)
@@ -354,12 +361,31 @@ class Relay:
def _do_send(self, ws: WebSocket, data, lock: BoundedSemaphore):
try:
with lock:
- ws.send(json.dumps(data))
+ ws.send(json.dumps(data))
except Exception as e:
logging.info('Relay::_do_send error: %s' % e)
def _send_event(self, ws: WebSocket, sub_id, evt, lock: BoundedSemaphore):
- self._do_send(ws=ws,
+ c_evt=evt
+ send_it = True
+ pf = self.not_ar[0]
+ for test in self.not_ar:
+ if 'mod' in test:
+ mod_n = test['mod']
+
+ h = int(c_evt['id'],16) % mod_n
+ for da in pf['filter']:
+ b = int(da,16)
+ if h == b :
+ send_it = False
+ logging.info('Relay::_do_send data match %s %s' % ( self.not_ar, hex(h)))
+ else:
+ logging.info('Relay::_do_send data nomatch %s %s %s' % ( self.not_ar, hex(h), hex(b) ))
+ #logging.info('Relay::_do_send data testdebug %s %s %d' % (c_evt['id'], self.not_ar, h))
+ #input('return')
+
+ if send_it:
+ self._do_send(ws=ws,
data=[
'EVENT',
sub_id,
diff --git a/requirements.txt b/requirements.txt
index b525244..1f28da8 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -50,6 +50,7 @@ six==1.16.0
soupsieve==2.3.2.post1
spake2==0.8
stem==1.8.0
+sympy==1.10.1
toml==0.10.2
tqdm==4.64.0
Twisted==22.4.0
`
As of now the easy way to save band-wide is to just use few relays but that will then centralize fast for mobile users the choices.
I've been thinking a lot about bloom filters too. This should address some of the motivation behind #515. It might even have way more impact than another encoding ... I would guess a lot of the events clients receive are duplicates, yet to determine they are duplicates the JSON still needs to be parsed.
If nostr is working well, duplicates are expected. The more duplicates the more censorship resistance. If I on average get 5 duplicates for every event I query for, bloom filters would be an 80% performance improvement.
I have a feeling that clients only get so many duplicates because they ask all relays for all the things all the time, instead of treating relays differently and asking each relay for different sets of events per public key.
instead of treating relays differently and asking each relay for different sets of events per public key
True but this sounds difficult to communicate/encourage.
Before jumping to solutions, the following questions come to mind. What kind of data usage have you seen, with which apps, and for how many followers? What's the split between notes and referenced images/video?
I've seen Amethyst pulling 64GB of data since May 1st with one paid relay plus the default ~20 relays and settings it ships with.
Would be a great topic for a survey.