titan icon indicating copy to clipboard operation
titan copied to clipboard

OnFlushCompleted should update file discardable size

Open yiwu-arbug opened this issue 6 years ago • 4 comments

Currently we only update file discardable size in OnCompactionCompleted. We assume blob file data would not be discarded during flush, which is not true. Consider the case:

  1. There's a key "foo" in blob file b1.
  2. GC rewrite b1 into b2 and insert "foo" to memtable.
  3. Before memtable is flushed, user delete "foo".
  4. After flush, "foo" is removed from the output of flush, since it's deleted. The fact that "foo" is discarded is not reflected in b2's discardable size.

yiwu-arbug avatar Sep 23 '19 03:09 yiwu-arbug

I don't know how to fix this one, since OnFlushCompleted don't know what's the input data size for each blob file.

yiwu-arbug avatar Sep 23 '19 03:09 yiwu-arbug

@yiwu-arbug I think we can use output data size to get discardable_size of flushed blob file:

  1. If discardable_size == 0, then discardable_size = file_size - output_size
  2. if discardable_size > 0, then discardable_size = discardable_size - output_size ( for those blob file indexes that not be flushed completely in one run )

The problem is if all keys of a blob_file are deleted in memtable then we will still lose discardable_size.

JiayuZzz avatar Oct 15 '19 02:10 JiayuZzz

@wujy-cs sounds reasonable. So basically, a) mark the latest memtable id (say, M) after GC finish, b) wait till M flushed, then subtract file size by sum of output size of all memtables?

yiwu-arbug avatar Oct 15 '19 04:10 yiwu-arbug

@yiwu-arbug Yes,. And we can do it in OnFlushCompleted without distinguishing gc rewritten blob file and flushed blob file, since outpu_size of flushed blob file always equals to file_size.

JiayuZzz avatar Oct 15 '19 06:10 JiayuZzz