map/set/zset/list data_cf distributed compaction
data_cf distributed compaction needs DB::Get from default cf in CompactionFilter, we bulk load all hash keys from default cf and save to shared filesystem for reading in dcompact workers.
When compaction input of dafa_cf is small but hash keys in default cf is large, this is a very big waste, --- when compact upper levels of data_cf, this may be likely happens.
So we should check data size in default cf before starting remote compaction, if it reaches a threshold of a percent of compaction input of dafa_cf, it should fallback to local compact.
- Use
Compaction::column_family_data()to getdefaultcf handle and DB ptr, this needs a globalstd::map - Use
DB::GetApproximateSizes()to get size indefaultcf.
Thus a customized CompactionExecutorFactory should be defined -- it should references DcompactEtcd factory and forward methods, the key point is to override ShouldRunLocal(compaction).