status-mobile
status-mobile copied to clipboard
Investigate topic/mailserver usage
Problem
Quite a few users have reported long syncing time, especially after some usage of the app. This task is an investigation in the behavior.
I managed to replicate long syncing time. I have dumped the topics that my app is listening to and check against the database how many envelopes are present per topic.
This is the script used to build the query with the topics:
This is the last 3 days of all the topics broken down by count:
And this is the last 3 days of the topics my app is following:
There's a bunch of heavy hitters that are in both:
\x4ad29422 | 5457
\xa4e9082f | 4920
\xa0204a26 | 4848
\x72bee2e4 | 4311
\x82fd27dc | 4226
\x2b355ff6 | 4093
\x5422f4bd | 4063
\x3dd7d6a8 | 3334
\xcd423760 | 1333
\x859b04b5 | 1183
Which is unexpected. We expect to have a single topic that has many messages (our partitioned topic), but the rest should not be so heavy (these are all the heaviest topic in the app.
I will try to figure out where these topics are coming from.
After further investigation I discovered that I had a lot of orphaned topics:
out of 500 ish topics only 120 had a filter associated. This meant that I was fetching a lot of messages that I would have never being able to decrypt. After cleaning up the topics connection issue went away, so there's clearly a bug somewhere which causes topics to be leaked.
Regarding heavy usage topics, I haven't yet figured it out. They were orphaned but there was a chat id associated with them, which is a one-to-one chat id, unclear why it would be.
0x4ad29422
0x042c8de3cbb27a3d30cbb5b3e003bc722b126f5aef82e2052aaef032ca94e0c7ad219e533ba88c70585ebd802de206693255335b100307645ab5170e88620d2a81
0x4ad29422
0x04606ae04a71e5db868a722c77a21c8244ae38f1bd6e81687cc6cfe88a3063fa1c245692232f64f45bd5408fed5133eab8ed78049332b04f9c110eac7f71c1b429
They are neither marked as negotiated topics nor discovery, so it's not clear where they are coming from.
The next step is to consolidate the logic so that topics are always calculated from filters, which should make sure that there's consistency between the two, and that would be a first step to start moving logic to status-go.
Ok, thanks to @jakubgs we figure out where these topics are coming from. They are from the push notification servers. Clients should be publishing on those, but should not be fetching messages from mailservers, whether they should be subscribed is a different story, as you need to be subscribed to publish a message, so that's inevitable, but they should unsubscribe shortly after. Though that functionality is not implemented in whisper/waku.
The number of topics has increased again a lot overnight (332), a 100 topics jump, and the app re-requested 24 hours of data for all topics even though it should not have, giving long blue spinner times.
Ok, I have figured out at least what the topic is: They are the partitioned topic of the push notification server. Those are heavy traffic topics as they are used by many clients. The two actions would be:
- Use a different scheme for push notification servers
- Don't fetch data from partitioned topics that are not yours (this is currently a bug). We need to create a topic because we want to post to it, but we should never fetch data from it.
Use a different scheme for push notification servers Don't fetch data from partitioned topics that are not yours (this is currently a bug). We need to create a topic because we want to post to it, but we should never fetch data from it.
Both have been addressed, another issue that I came up against is that my app made after 24 hours of being offline:
17 requests against the mailserver, each asking for 1000 messages,that means that the query returned 17K messages at least, if everything is correct.
By querying directly the database I noticed that for the whole history for those topics there are only 11516 envelopes, so something is clearly odd there, investigating.
Attached the query used. query.log
The code was still using bloomfilters, there was yet another place where that was not passed :(
Still relevant @cammellos ?
closing due to inactivity and moving to the wakuv2, , label recheck added for future re-consideration