[server][improve] Add WAL cache to optimize replication.
Add WAL LogEntry cache to improve replication. Bypass page-cache and eliminate deserialization overhead when tailing-read the WAL.
Under ideal conditions, the oxia_server_wal_read_latency_milliseconds_sum can be 0.
Test the WAL via wal-perf
before:
after:
The Read/Write throughput increase about 32%
Perf test after address review comments, the performance is still OK
LGTM +1
It would be better if you could consider this.
Oxia is a sharding systems. we need to consider more of cache memory control. currently, every shard has their own WAL. and I am not sure if we will go or when we will go for sharding WAL(IMO, we should go to avoid mMap cost by many opened segment). but anyway, we should pay attention on the cost.
- We need a global cache to avoid shards_num * 2MB. (100 shards = 200MiB, 1_000 shards ~ 2GiB). this is still very useful when we migrate to sharding WAL.
I've considered this. Even in future single WAL instances, we still strive to distribute memory evenly among each shard. Otherwise, there will still be cache penetration, which is not much different from the current implementation
I've considered this. Even in future single WAL instances, we still strive to distribute memory evenly among each shard. Otherwise, there will still be cache penetration, which is not much different from the current implementation
well... after deep thinking. I think the write cache could not help us very much in this case. because reader always happened after the data sync. If the write traffic is very large, the 2MiB buffer will never work as expected.
Indeed, your benchmark proof some improvement, but that logic is different as oxia. let me change some logics to make it match the implementation of Oxia. Plus, you could also use your implementation in the cluster benchmarking to see if any improvement on oxia_server_wal_read_latency_milliseconds_sum .