nessie icon indicating copy to clipboard operation
nessie copied to clipboard

GC: Check and remove commitProtectionDuration in gcParams

Open ajantha-bhat opened this issue 2 years ago • 2 comments

GC identify has two steps.

  1. to identify live contents and fill to bloom filter
  2. Identify expired contents.

currently for step 2, commitProtectionDuration is used to avoid the new commits or in-progress commits to be expired. here we can check the commits against the cutoff time per reference itself (which is used in step 1). Only check the contents that are in older than cutoff time against the bloom filter.

So, we can never test live contents (also new contents) against the bloom filter.

ajantha-bhat avatar Mar 30 '22 05:03 ajantha-bhat

That protection-time is meant to protect data that has "just been written" - it's not to "protect commits" (as you said). You have to prevent that e.g. new snapshots in a table-metadata are not expired.

snazy avatar Mar 30 '22 13:03 snazy

That protection-time is meant to protect data that has "just been written" - it's not to "protect commits" (as you said). You have to prevent that e.g. new snapshots in a table-metadata are not expired.

Agree, but just written commit will have commit time newer than cutoff time. So, using cutoff time is enough instead of a separate configuration. Because If I check only the commits that are older than cutoff time, "just been written" commits also will not be checked against bloom filter

ajantha-bhat avatar Mar 30 '22 14:03 ajantha-bhat