zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Fast Dedup: prune unique entries

Open don-brady opened this issue 8 months ago • 8 comments

Motivation and Context

As a complement to the dedup quota feature, it would be nice to have an online maintenance operation that walks the DDT and prunes the oldest entries from the “single” reference table(s) to free up space.

Description

Adds a new zpool ddtprune command that walks the DDT and prunes the oldest entries from the unique class (ref = 1). The amount to prune can be specified in days (-d <days>) or as an overall percentage of all the unique DDT entries (-p <percent>). For the percentage case, an initial pass is made over all the unique class entries to generate a histogram that is then use to find the number of days to go back in order to fulfill the percentage goal.

A prune operation can be canceled by killing (cntrl-C) the zpool ddtprune process. It will also self-cancel during a zpool export of the pool.

The iteration over the unique DDT entries happens in open context and once a batch of prune candidates has been collected, it is processed in syncing context by adding prune candidates to the existing DDT AVL tree of updates. The candidates are then pruned in ddt_sync_entry() during the ddt_sync_table() phase.

This feature leverages a 'last modified' timestamp in the 'flat' on-disk DDT entry that indicates the time that the entry was added to the unique class.

Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

Notes

During a scrub, dsl_scan_ddt() first scans all the non-unique ddt blocks (doesn't look at unique class entries). Then in the top-down scrub, it calls ddt_class_contains() to see if it already scrubbed the block. This only checks for DDT_CLASS_DUPLICATE entries in the ddt_object_lookup(), and returns true it finds one for the given block. So AFAIKT, it never consults the DDT for scrubbing unique class entries during top-down scrub, only for duplicates. So pruning from unique class should have no net change for scrubs. Any block that transitions from unique to duplicate class during a scrub, will get scrubbed at the point of transition, in ddt_sync_entry().

To Do:

  • [ ] Add man page for zpool ddtprune
  • [ ] Add basic pos/neg ZTS tests for ddt prune

How Has This Been Tested?

  1. Manual testing with 10+ million unique entries. Used artificial aging (ddt_prune_artificial_age=B_TRUE) to get a wider age distribution.
  2. ZTS functional/dedup tests
  3. ztest/zloop testing which will periodically make a call to ddt_prune_unique_entries()

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Performance enhancement (non-breaking change which improves efficiency)
  • [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [ ] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [ ] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [ ] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

don-brady avatar Jun 17 '24 23:06 don-brady