manticoresearch save seqno periodically to allow IST on node restart after crash

Galera use IST on node restart only node seqno saved well on node shutdown overwise Galera issue SST for node to join the cluster. That is why if node got crashed and restarts after crash it issue SST that could take time and slow cluster down due to intensive file and network IO.

It could be better to call Galera to save its state from time to time or keep cluster seqno separate into manticore.json file and provide that seqno into Galera on node restart. However that needs to investigation if node after restart will receive some TNX from the cluster IST these it already replayed from the binlog.

Feb 28 '24 10:02 tomatolog

To continue the development it could be better to wait for refactor of the binlog code at https://github.com/manticoresoftware/manticoresearch/issues/879 then use a new binlog instance that just saves seqno from the clusters. As for now if cluster writes seqno into the binlog along with index TNX but later the binlog could be truncated if all indexes got saved and that means the cluster seqno got lost after that truncate.

After the refactoring of the binlog code it could be easier to use dedicated binlog for all clusters that saves cluster seqno for every TNX and has meta at the end of the last binlog file with all clusters seqno.

May 07 '24 13:05 tomatolog

Blocked by https://github.com/manticoresoftware/manticoresearch/issues/879

May 08 '24 08:05 sanikolaev

Blocked by https://github.com/manticoresoftware/manticoresearch/issues/879

Unblocked.

Jul 15 '24 08:07 sanikolaev

posted related points at https://manticoresearch.slack.com/docs/T5DP19MNZ/F07CTDTKZPU

Jul 17 '24 11:07 tomatolog