feat: Introduce Raftify and `RaftGlobalTimer` to replace `DistributedGlobalTimer`

Open jopemachine opened this issue 2 years ago • 0 comments

This PR aims to resolve the distribution locking issues by integrating Raftify with Backend.AI manager (based on GlobalTimer operating by Raft algorithm).

Any kind of feedback is welcome.

This PR also partially resolves https://github.com/lablup/backend.ai/issues/1634

How to setup test environment

Set num-proc of manager.toml to an arbitrary number other than 1, and set raft section when running Backend.AI manager.
Create raft-cluster-config.toml and set initial_peers there. Below is an example.

[[peers.other]]
host = "192.168.0.1"
port = 60151
node-id = 1
role = "voter"

[[peers.other]]
host = "192.168.0.1"
port = 60152
node-id = 2
role = "voter"

[[peers.other]]
host = "192.168.0.1"
port = 60153
node-id = 3
role = "voter"

[[peers.myself]]
host = "192.168.0.2"
port = 60154
node-id = 4

[[peers.myself]]
host = "192.168.0.2"
port = 60155
node-id = 5

Testing and debugging

For putting a new log entry,

curl -XGET http://localhost:60251/put/1/test

For printing all persisted logs,

./backend.ai mgr raft debug persisted-all ./logs

Aug 25 '23 03:08 jopemachine