[Improve] Change default value for CompactionMaxTimeMillis.
Motivation
The default value of minorCompactionMaxTimeMillis is -1 to let the compaction in no limit time, which will significantly reduces the ability of the compaction mechanism.
Quantify compaction capability
Firstly, we need to conduct a simple modeling of the compaction to obtain a series of quantitative relationships. Obviously, we have:
Dcompact = Vcompact * Tcompact
Dcompact represents the amount of compacted data, i.e. the size of reclaimed storage space, Vcompact represents the compaction speed, and Tcompact represents the compaction time.
As the entry log file contains both valid data and dirty data, we can define the notion -- valid data ratio for entry log file.
valid data ratio = valid data size / total data size
And we need to introduce another variable Dscan, before that, we need to be clear about the compaction procedure.
When a compactor try to compact a entry log file, it will read all of the valid data from this entry log file, and copy them to a new entry log file, and finally delete the old entry log file. So the compactor need to consume the I/O ability of valid data size to compact a entry log file.
Therefore, the size of valid data to read by compactor depends on the valid data ratio of this entry log file. Let the size of valid data to read by compactor to Dscan, we have:
entry log size = Dscan / p
Dcompact = entry log size * (1-p) = Dscan / p * (1-p)
Dscanis the cost when compactor compact an entry log file.Dcompactis the gain when compactor compact an entry log file. We need to point out thatDcompactis strongly related to the valid data ratio of the entry log file it compact. See as the following graph:
why infinite compaction time will reduces the ability of the compaction mechanism?
When the compactor start a compaction routine, it will aggregate the valid data ratio of all entry log file belonging to this bookie(get a map), and try to compact the entry log file with the smallest valid data ratio, and then compact the next smallest one... After aggerate the valid data ratio map for all entry log file, this map will not update dynamically, but create a new one in the next compaction routine. As time goes by, the valid data ratio of entry log file will gradually decrease, some entry log file that is new created even not exit in the map, so if we unlimit the compaction time, the compactor will very likely try to compact an entry log file of 90% valid data raito, but ignore those entry log files of 10%, 20%, and so on. Compacting the entry log file of 90% valid data raito is a very hard job, and gains almost nothing.
This is the reason why we can't let the compaction time to be so large that it try to do the hard work, but ignore the easy one. But we can't set it too small too, as the aggregation routine take times too. One hour is a good value.
Changes
Change the default value of minorCompactionMaxTimeMillis and majorCompactionMaxTimeMillis to 1h.
I think we should add it to the history docs as well.
I think we should add it to the history docs as well.
Sorry, that's comment for #4355