Omit L0 Flush
Description
Here Omit L0 Flush is: definitely reduce IO, memory and CPU. Vaule content should not be stored in MemTable, instead storing value offset(of WAL Log) and size in MemTable. So WAL also need to be mmap'ed. The complexity is:
- there are padding bytes in some single WAL Log entry, when the entry stridding page boundary
- so we need to implement a new WAL Log format which has no padding in any single entry
- truncate and mmap WAL Log file during WAL Log file creation
- rocksdb can have multiple column families, which share WAL Log, but do not share MemTable and SST
- SST, MemTable and WAL Log mapping and management are required
- many changes on DB Write code path are required
Related feature
We have realized feature Convert MemTable to L0 SST, this feature needs MemTableRep to implement a new method ConvertToSST, now CSPPMemTab realized this feature by write data to file mmap.
The issue is: to be reliable, write data to file mmap does not reduce IO, it just spread the IO pressure evenly during the lifetime of MemTable.
In the best cases, we set CSPPMemTab.sync_sst_file=false, let the operating system to perform the sync appropriatly, thus when the file is deleted after L0->L1 compact while the corresponding page caches have not write back to devices, the
write back to devicescan be omited.