nessie
nessie copied to clipboard
GC: Check and remove commitProtectionDuration in gcParams
GC identify has two steps.
- to identify live contents and fill to bloom filter
- Identify expired contents.
currently for step 2, commitProtectionDuration
is used to avoid the new commits or in-progress commits to be expired.
here we can check the commits against the cutoff time per reference itself (which is used in step 1).
Only check the contents that are in older than cutoff time against the bloom filter.
So, we can never test live contents (also new contents) against the bloom filter.
That protection-time is meant to protect data that has "just been written" - it's not to "protect commits" (as you said). You have to prevent that e.g. new snapshots in a table-metadata are not expired.
That protection-time is meant to protect data that has "just been written" - it's not to "protect commits" (as you said). You have to prevent that e.g. new snapshots in a table-metadata are not expired.
Agree, but just written commit will have commit time newer than cutoff time. So, using cutoff time is enough instead of a separate configuration. Because If I check only the commits that are older than cutoff time, "just been written" commits also will not be checked against bloom filter