antidote
antidote copied to clipboard
Time handling in antidote
With the change to Erlang 19 we removed all calls to erlang:now
.
There are still some open problems with this change and the handling of time in general:
- [ ] Monotonicity: Replacing
erlang:now
witherlang:system_time
(as we did) should work, assystem_time
is still monotonic with the default time warp setting ("No Time Warp Mode"). However, this means that our code is not time warp safe and would be incorrect if Erlang is started with time warp mode enabled (which is recommended). - [ ] Uniqueness: Most old uses of
erlang:now
do not need uniqueness. I think it might still be required for generating transaction-ids (inclocksi_interactive_tx_coord_fsm
, line 171). There might be other places. - [ ] Restarts:
erlang:now
anderlang:system_time
seem to loose their guarantees after a system restart. So after a restart we might get an older time stamp, which could break the protocol.
Is someone assigned to work on this?
Not a fix, but a suggestion is to run NTP on all nodes in a DC before any of the erlang VMs are started.
My understanding is that "time warp" is for large clock corrections, but there is also "time correction" which adjusts the erlang clock frequency by a small amount without violating monotonicity. So if system clocks are synced before the start and continually then the hope is that "time correction" will be sufficient. Though there might be other downsides of using no time warp mode? Maybe worth testing.
The uniqueness of txids that you mentioned and after restarts do look like issues that could happen, but should be easy to fix I guess.
Looking at the new Erlang API for time correction, I think we can fix this issue now.
The time functions currently used are:
bcoutner_mgr.erl: erlang:timestamp()
(4 times)
dc_utilities.erl: erlang:system_time(micro_seconds)
and two calls to rand:seed
:
clocksi_interactive_coord.erl, line 528
interactive_dc_query_receive_socket.erl, line 107
The generation of transaction IDs is currently handled by the call to the dc_utilities function.
Looking at both the current Erlang time correction documentation and random numbers documentation, we could do the following:
- Set the time warp VM argument:
+C multi_time_warp
- Use a tuple to create strictly monotonic timestamps (for dc_utilities), which will also uphold the guarantees we need after a restart:
Time = erlang:monotonic_time(),
UMI = erlang:unique_integer([monotonic]),
EventTag = {Time, UMI}
- According to the
rand
documentation, calling the seed is not needed. The processes state is seeded once when calling therand
module for the first time. So we could remove the tworand:seed
calls.
Does this solve the problems we currently have with time handling?
I do not really know how the bounded counter manager works, so I'd need input on what guarantees it needs. Currently the bounded counter manager uses erlang:timestamp
. The timestamp
function does not give any guarantees (no monotonicity nor uniquess) to my knowledge.
@balegas ?
I was checking the code and I think that timestamps in the bounded counter manager are used to timeout resource transfers requests. It does not depend on timestamps to set identifiers, or ordering.
Is this issue still open? I implemented a very simple solution for this problem in gingko but I don't know if it performs well Basically I used a gen_server that made sure that every new timestamp (regular erlang timestamp translated to microseconds) was strictly monotonic and if it was not then the last timestamp was incremented by one and used instead