zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Fast Dedup: Introduce the FDT on-disk format and feature flag

Open allanjude opened this issue 1 year ago • 4 comments

Motivation and Context

The upcoming dedup features require on-disk changes, which means a new feature flag.

Description

Traditionally, dedup objects live directly in the MOS root. While their details vary (checksum, type and class), they are all the same "kind" of thing - a store of dedup entries.

The new features are more varied than that, and are better thought of as a set of related stores for the overall state of a dedup table.

This adds a new feature flag, SPA_FEATURE_FAST_DEDUP. Enabling this will cause new DDTs to be created as a ZAP in the MOS root, named DDT-<checksum>. The is used as the root object for the normal type/class store objects, but will also be a place for any storage required by new features.

This commit adds two new fields to ddt_t, for version and flags. These are intended to describe the structure and features of the overall dedup table, and are stored as-is in the DDT root. In this commit, flags are always zero, but the intent is that they can be used to hang optional logic or state onto for new dedup features. Version is always 1.

For a "legacy" dedup table, where no DDT root directory exists, the version will be 0.

ddt_configure() is expected to determine the version and flags features currently in operation based on whether or not the fast_dedup feature is enabled, and from what's available on disk. In this way, its possible to support both old and new tables.

How Has This Been Tested?

TBD.

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Performance enhancement (non-breaking change which improves efficiency)
  • [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [x] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [x] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [x] I have added tests to cover my changes.
  • [x] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

allanjude avatar Feb 14 '24 14:02 allanjude

[Fast dedup stack rebased to master 3c941d181]

robn avatar May 15 '24 03:05 robn

@allanjude @robn ping. I still think my second comment above marked resolved prematurely.

amotin avatar Jun 10 '24 23:06 amotin

Alright, I think that's done with! Some tests too, goodness.

robn avatar Jun 13 '24 06:06 robn

[Fast dedup stack rebased to master c98295eed]

robn avatar Jun 14 '24 01:06 robn

It looks like the test cases aren't passing on Centos 8-9:

04:12:36.05 ERROR: eval zdb -D testpool | grep -q 'DDT-sha256-zap-unique: 4 entries' exited 1

Could you take a look at those, and also rebase on master to fix the Fedora failures?

tonyhutter avatar Jul 16 '24 18:07 tonyhutter

Looking at the test failures here, and why they pass on Ubuntu but not CentOS

allanjude avatar Jul 26 '24 18:07 allanjude

It looks like the test cases aren't passing on Centos 8-9:

04:12:36.05 ERROR: eval zdb -D testpool | grep -q 'DDT-sha256-zap-unique: 4 entries' exited 1

The culprit was SELinux. Setting xattr=sa solves it by not ending up with a 5th entry for our 4 block file.

allanjude avatar Jul 28 '24 14:07 allanjude

@allanjude @robn thanks for running that down. Can you also look in to the ztest failure, this PR seems to reliably cause it to fail.

behlendorf avatar Jul 30 '24 00:07 behlendorf

Can you also look in to the ztest failure, this PR seems to reliably cause it to fail.

@robn I've asked @ixhamza to take a look at this, and he found that zdb fails here: https://github.com/openzfs/zfs/blob/master/cmd/zdb/zdb.c#L8067 . I think dump_mos_leaks() in zdb just needs to be updated to account for ddt->ddt_dir_object here and then for ddl->ddl_object's in https://github.com/openzfs/zfs/pull/15895.

amotin avatar Aug 03 '24 00:08 amotin

@robn - Mav is right, https://github.com/truenas/zfs/commit/da7bfc8 fixes ztest failure due to MOS objects leak for this PR.

ixhamza avatar Aug 05 '24 20:08 ixhamza

@robn - Mav is right, truenas@da7bfc8 fixes ztest failure due to MOS objects leak for this PR.

Integrated and rebased

allanjude avatar Aug 05 '24 21:08 allanjude

@amotin @ixhamza that looks right, good work.

I'm sorry I didn't see this sooner; I would have pointed you at this chunk of a0339495, which takes care of it (same as what you did, just different order for no reason). We were still checking it for the entire FDT patch series, and hadn't broken it up for individual PRs yet. Sorry for wasting your time :(

robn avatar Aug 05 '24 23:08 robn

Alright, this push adds the basic version of a reworked dedup leak checker in zdb. Its not strictly necessary on this PR, but it lays groundwork for the rest of the patch series.

robn avatar Aug 11 '24 11:08 robn