zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Fast Dedup: “flat” DDT entry format

Open allanjude opened this issue 1 year ago • 5 comments

Motivation and Context

The on-disk and in-memory dedup entry structures are larger than they need to be. Any reduction we can make reduces memory and IO overheads for every entry, which in large dedup tables can be huge.

Description

This slims down the in-memory ddt_entry_t, partly by reorganizing the structure and using narrower types, and partly by moving rarely-used parts out.

This then adds a new variant of the entry format. The traditional format keeps a complete set of 4x DVAs for each possible value of copies= (plus one for the deprecated ditto blocks feature), which makes the in-memory and on-disk entry mostly empty, which is significant wasted overhead. This adds a new “flat” format which only has a single set of DVAs, but can “extend” them if a write requests more (eg writing a block with copies=1, setting copies=2, then copying the block).

How Has This Been Tested?

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [x] Performance enhancement (non-breaking change which improves efficiency)
  • [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [x] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [x] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [x] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

allanjude avatar Feb 14 '24 14:02 allanjude

[Fast dedup stack rebased to master 3c941d181]

robn avatar May 15 '24 03:05 robn

FYI, added one commit, that adds a birth time field to ddt_flat_phys_t. This is for the upcoming pruning feature, but we want the change here so that we don't need an additional on-disk format change.

robn avatar May 21 '24 05:05 robn

[Fast dedup stack rebased to master c98295eed]

robn avatar Jun 14 '24 01:06 robn

I think this is good to go; I'm just waiting for confirmation internally that this is all we need for the on-disk format changes.

robn avatar Jun 14 '24 02:06 robn

No format changes, but there is a good chunk of code change to make the split between traditional and flat entries a bit easier to work with. Earlier versions of this code assumed ddt_phys_t is the same in both; change was shoehorned into flat quite late to support prune. This is a much better model for calling code to handle the differences.

We believe this is the final on-disk format, and likely the final code change to support it. We've got some stress testing to do but that's it.

robn avatar Jun 17 '24 05:06 robn

@robn - JFYI, the ztest with your latest push is still failing, similar to https://github.com/openzfs/zfs/pull/15895#issuecomment-2269988653:

leaked space: vdev 0, offset 0xe98da000, size 28672
block traversal size 610742272 != alloc 610770944 (leaked 28672)

ztest: '/sbin/zdb -bccsv -G -d -Y -e -y  -p /var/tmp/zloop-run 14775065184965730235' exit code 2

ixhamza avatar Aug 09 '24 09:08 ixhamza

Yeah, on it. The push was actually just a rebase; there's no actual difference dedup-side. The difference now is just that zdb is no longer fit for purpose. I'm right now splicing the relevant bits of the "final" FDT-aware zdb into the right PRs, and that should be that. Should be pushing in the next hour or two.

robn avatar Aug 09 '24 10:08 robn

@ixhamza can you confirm, which branch did you see this leak from ztest/zdb on? I understand it on fdt-rel-log, I don't understand it on fdt-rel-feature, and sorta-maybe for fdt-rel-flat (this PR).

As it is, I have some rework to do on the new zdb. I did the work described above, and then was chasing a leak for a while. I just now figured out what's wrong, but its after midnight and I'm toast. So I'll come back to it in the morning.

robn avatar Aug 09 '24 14:08 robn

@robn - I am just checking the logs from the PR GitHub Actions. They are shown as Zpool-logs-20.04 and Zpool-logs-22.04, which I believe is a typo. If there are no core dump in the zip file, they are most probably failed due to zdb and you can find out the reason in ztest.out. https://github.com/truenas/zfs/commit/302fb34949e319bb529aa6efeeeec005656f56bf - If it's helpful, I use this just to run ztest GitHub actions in a temporary branch.

ixhamza avatar Aug 09 '24 14:08 ixhamza

Last push rebased onto the latest #15892. Small zdb included in this PR to properly count/claim phys blocks extended with additional DVAs.

robn avatar Aug 11 '24 11:08 robn