cabal-core Message ordering

Message ordering

Open okdistribute opened this issue 4 years ago • 10 comments

Please feel free to disregard if you don't want to go ahead with this, but this is a small tweak that I'd like to propose that I think will improve the reliability of cabal especially for mobile, where clocks are less reliable.

The messages in cabal are currently ordered by 'real' timestamp and also here. This poses an issue for the often case where 'real' clocks are not reliable, if you go offline for 24+hrs or a few days, you'll start to notice your electronics will suffer from 'clock drift' .. or sometimes will just reset randomly to some date in January 2017 or something like that.

An improvement on this is where you take a number and add it to the 'real' clock time, and this number is based upon other numbers you have seen for messages in your log. This is called a vector clock. A simpler implementation can be done as a lamport clock. Just a silly old European way of taking a simple idea and putting someone's name on it, but it's pretty straightforward.

This is a quick watch and a really useful one if you have time: https://www.dotconferences.com/2019/12/james-long-crdts-for-mortals

The benefits for Cabal would be pretty unnoticed for most people especially us, since we are all usually connected to the Internet. But it might also help for messages that come 'back in time' as it would be more clear which exact message before (based on the number), that the person was responding to or messaging after.

Curious what people think about this change?

Feb 18 '20 19:02 okdistribute

@okdistribute I agree that clock drift is problematic. IRC "solves" this by being realtime and not storing messages, which we don't have the luxury of.

This would mean including a vector clock timestamp in each message sent, right? The clock will grow linearly with N, the number of participants in the cabal. I'm concerned about this. A cabal with 1000 users (over its lifetime) and a per-clock size of 4 bytes (32-bit unsigned int), means every cabal message has an additional overhead of ~4kb. A cabal with 10k messages will then contain up to 4mb of vector clock data.

Another approach, which hyperlog used (and we use in Mapeo) is to include an array of links to the message IDs that are considered to be at the current head of the message graph. This has a tendency to converge toward 1-link-per-message. (I can expand on this if that doesn't make sense to some folx, but I know @okdistribute you're already familiar with this data structure). The benefit is that this won't grow with the number of participants, but instead with the # of forks at any given time. A message in cabal can be referenced by a public key and a sequence number, which comes to ~36 bytes as the average overhead on messages.

Feb 18 '20 20:02 hackergrrl

OK thanks @noffle, that makes sense. Does that cause performance tradeoffs for indexing or displaying the most recent messages in order? Just curious as usually disk space & performance have tradeoffs.

Feb 18 '20 21:02 okdistribute

A simple approach to experiment with is using a single monotonically increasing clock for all users. When you post a message, give it the current time or the highest time you've seen in any other message (plus one), whichever is higher.

Sorting messages by this value puts them into one of the possible causal orderings.

However, a bad actor can post a very high timestamp and cause everyone trouble with integer overflow. Maybe Cabal users trust each other enough to not do that? Or maybe it's possible to ignore messages with times too far in the future.

This does not give the same richness of causal information as the backlinks approach described by @noffle . That extra info might enable a better user experience when old messages arrive?

Feb 19 '20 01:02 cinnamon-bun

Oops, I think I reinvented Lamport clocks. 😬 I guess it's time for me to read some theory...

Feb 19 '20 01:02 cinnamon-bun

@okdistribute compared to vector clocks, which you can easily write a sort function for, the links approach would require an index to do the sorting. Once sorted though, querying would be the same as it is currently (a leveldb range query).

I'm not sure what the algorithm would be for this. In Mapeo we'd iterate over historic records by literally following the links, which would probably be much too slow for displaying a list of messages in realtime.

Feb 19 '20 01:02 hackergrrl

when i was in berlin, peg linked me a paper on bloom clocks. i haven't done any deep reading of the paper, so i can't really say anything about tradeoffs, but i imagine it solves the space-efficiency problem of pure vector clocks. here's also a really nice tutorial on bloom filters peg had lying around

paper abstract

The bloom clock is a space-efficient, probabilistic data structure designed to determine the partial order of events in highly distributed systems. The bloom clock, like the vector clock, can autonomously detect causality violations by comparing its logical timestamps. Unlike the vector clock, the space complexity of the bloom clock does not depend on the number of nodes in a system. Instead it depends on a set of chosen parameters that determine its confidence interval, i.e. false positive rate. To reduce the space complexity from which the vector clock suffers, the bloom clock uses a 'moving window' in which the partial order of events can be inferred with high confidence. If two clocks are not comparable, the bloom clock can always deduce it, i.e. false negatives are not possible. If two clocks are comparable, the bloom clock can calculate the confidence of that statement, i.e. it can compute the false positive rate between comparable pairs of clocks. By choosing an acceptable threshold for the false positive rate, the bloom clock can properly compare the order of its timestamps, with that of other nodes in a highly accurate and space efficient way.

from wikipedia:

In 2019, Lum Ramabaja developed Bloom Clocks, [7] a probabilistic data structure whose space complexity does not depend on the number of nodes in a system. If two clocks are not comparable, the bloom clock can always deduce it, i.e. false negatives are not possible. If two clocks are comparable, the bloom clock can calculate the confidence of that statement, i.e. it can compute the false positive rate between comparable pairs of clocks.

that said i've never heard of bloom filters being used productively lol

Feb 19 '20 10:02 cblgh

This is all useful information and the indexing/sorting problem makes me think it might be worth the extra 4mb in the case that a group gets to 1000k+ people to reduce sorting time on messages.

Implementing a bloom clock/filter could be cool but seems like more work

Feb 19 '20 18:02 okdistribute

extra 4mb in the case that a group gets to 1000k+ people

It was 1k users @ 10k messages. So even with a modest number of users, the cabal's size will grow much faster than a timestamping method that is independent of the number of participants. Since this is a db change and not just a protocol change, it will be harder to modify in the future.

@okdistribute have you noticing this problem with clock drift happening much in cabal?

Another solution could be a node using cabal peers as NTP servers. A node could look at the reported system time of its peers, and decide if it wants to offset its own clock to try and match theirs. It wouldn't be a trusted clock, but might be good enough for a medium-trust environment.

Another trick, which I learned from writing realtime multiplayer games, is to figure out your peers' clock offset, and apply those offsets locally. The idea is that when connecting to a peer, you share your system time with each other. You can use this to figure out the offset of that peer's clock relative to yours, and apply that to the messages you receive from them. This is cool because it doesn't matter what time anyone's clock is set to: each peer has a subjective relative view! The downside is that you'd only know offsets for peers you can directly connect to. There might be some kind of scheme where you receive transitive information about other peers' reported offsets, but I haven't thought that through & there's always security challenges when you're trusting other peers' subjective reports.

Feb 19 '20 20:02 hackergrrl

@noffle like I said, I haven't noticed it because I'm always online. It's more of a its-not-a-problem-until-it-is, you can leave it the same and probably nothing bad will happen until someone's messages are sorted at the beginning of time, and no one sees them, but then no one would probably notice

To note, that's 4mb if you download the entire history of the cabal, which hopefully wouldn't have to be the case if we have an easy way to request the last X messages (which may be easier if they're sorted by a number!)

Feb 19 '20 20:02 okdistribute

@noffle revisiting this as some folks today noticed their clocks were a couple seconds off from each other while chatting.. XD

using cabal peers as NTP servers is an interesting hack for sure!

Jul 08 '20 00:07 okdistribute

cabal-core cabal-core copied to clipboard

Message ordering

cabal-core
cabal-core copied to clipboard