manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

save seqno periodically to allow IST on node restart after crash

Open tomatolog opened this issue 1 year ago • 4 comments

Galera use IST on node restart only node seqno saved well on node shutdown overwise Galera issue SST for node to join the cluster. That is why if node got crashed and restarts after crash it issue SST that could take time and slow cluster down due to intensive file and network IO.

It could be better to call Galera to save its state from time to time or keep cluster seqno separate into manticore.json file and provide that seqno into Galera on node restart. However that needs to investigation if node after restart will receive some TNX from the cluster IST these it already replayed from the binlog.

tomatolog avatar Feb 28 '24 10:02 tomatolog

To continue the development it could be better to wait for refactor of the binlog code at https://github.com/manticoresoftware/manticoresearch/issues/879 then use a new binlog instance that just saves seqno from the clusters. As for now if cluster writes seqno into the binlog along with index TNX but later the binlog could be truncated if all indexes got saved and that means the cluster seqno got lost after that truncate.

After the refactoring of the binlog code it could be easier to use dedicated binlog for all clusters that saves cluster seqno for every TNX and has meta at the end of the last binlog file with all clusters seqno.

tomatolog avatar May 07 '24 13:05 tomatolog

Blocked by https://github.com/manticoresoftware/manticoresearch/issues/879

sanikolaev avatar May 08 '24 08:05 sanikolaev

Blocked by https://github.com/manticoresoftware/manticoresearch/issues/879

Unblocked.

sanikolaev avatar Jul 15 '24 08:07 sanikolaev

posted related points at https://manticoresearch.slack.com/docs/T5DP19MNZ/F07CTDTKZPU

tomatolog avatar Jul 17 '24 11:07 tomatolog