OnFlushCompleted should update file discardable size
Currently we only update file discardable size in OnCompactionCompleted. We assume blob file data would not be discarded during flush, which is not true. Consider the case:
- There's a key "foo" in blob file b1.
- GC rewrite b1 into b2 and insert "foo" to memtable.
- Before memtable is flushed, user delete "foo".
- After flush, "foo" is removed from the output of flush, since it's deleted. The fact that "foo" is discarded is not reflected in b2's discardable size.
I don't know how to fix this one, since OnFlushCompleted don't know what's the input data size for each blob file.
@yiwu-arbug I think we can use output data size to get discardable_size of flushed blob file:
- If discardable_size == 0, then discardable_size = file_size - output_size
- if discardable_size > 0, then discardable_size = discardable_size - output_size ( for those blob file indexes that not be flushed completely in one run )
The problem is if all keys of a blob_file are deleted in memtable then we will still lose discardable_size.
@wujy-cs sounds reasonable. So basically, a) mark the latest memtable id (say, M) after GC finish, b) wait till M flushed, then subtract file size by sum of output size of all memtables?
@yiwu-arbug Yes,. And we can do it in OnFlushCompleted without distinguishing gc rewritten blob file and flushed blob file, since outpu_size of flushed blob file always equals to file_size.