Fast Dedup: “flat” DDT entry format
Motivation and Context
The on-disk and in-memory dedup entry structures are larger than they need to be. Any reduction we can make reduces memory and IO overheads for every entry, which in large dedup tables can be huge.
Description
This slims down the in-memory ddt_entry_t, partly by reorganizing the structure and using narrower types, and partly by moving rarely-used parts out.
This then adds a new variant of the entry format. The traditional format keeps a complete set of 4x DVAs for each possible value of copies= (plus one for the deprecated ditto blocks feature), which makes the in-memory and on-disk entry mostly empty, which is significant wasted overhead. This adds a new “flat” format which only has a single set of DVAs, but can “extend” them if a write requests more (eg writing a block with copies=1, setting copies=2, then copying the block).
How Has This Been Tested?
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [x] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [x] Documentation (a change to man pages or other documentation)
Checklist:
- [x] My code follows the OpenZFS code style requirements.
- [x] I have updated the documentation accordingly.
- [x] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [x] I have run the ZFS Test Suite with this change applied.
- [x] All commit messages are properly formatted and contain
Signed-off-by.
[Fast dedup stack rebased to master 3c941d181]
FYI, added one commit, that adds a birth time field to ddt_flat_phys_t. This is for the upcoming pruning feature, but we want the change here so that we don't need an additional on-disk format change.
[Fast dedup stack rebased to master c98295eed]
I think this is good to go; I'm just waiting for confirmation internally that this is all we need for the on-disk format changes.
No format changes, but there is a good chunk of code change to make the split between traditional and flat entries a bit easier to work with. Earlier versions of this code assumed ddt_phys_t is the same in both; change was shoehorned into flat quite late to support prune. This is a much better model for calling code to handle the differences.
We believe this is the final on-disk format, and likely the final code change to support it. We've got some stress testing to do but that's it.
@robn - JFYI, the ztest with your latest push is still failing, similar to https://github.com/openzfs/zfs/pull/15895#issuecomment-2269988653:
leaked space: vdev 0, offset 0xe98da000, size 28672
block traversal size 610742272 != alloc 610770944 (leaked 28672)
ztest: '/sbin/zdb -bccsv -G -d -Y -e -y -p /var/tmp/zloop-run 14775065184965730235' exit code 2
Yeah, on it. The push was actually just a rebase; there's no actual difference dedup-side. The difference now is just that zdb is no longer fit for purpose. I'm right now splicing the relevant bits of the "final" FDT-aware zdb into the right PRs, and that should be that. Should be pushing in the next hour or two.
@ixhamza can you confirm, which branch did you see this leak from ztest/zdb on? I understand it on fdt-rel-log, I don't understand it on fdt-rel-feature, and sorta-maybe for fdt-rel-flat (this PR).
As it is, I have some rework to do on the new zdb. I did the work described above, and then was chasing a leak for a while. I just now figured out what's wrong, but its after midnight and I'm toast. So I'll come back to it in the morning.
@robn - I am just checking the logs from the PR GitHub Actions. They are shown as Zpool-logs-20.04 and Zpool-logs-22.04, which I believe is a typo. If there are no core dump in the zip file, they are most probably failed due to zdb and you can find out the reason in ztest.out. https://github.com/truenas/zfs/commit/302fb34949e319bb529aa6efeeeec005656f56bf - If it's helpful, I use this just to run ztest GitHub actions in a temporary branch.
Last push rebased onto the latest #15892. Small zdb included in this PR to properly count/claim phys blocks extended with additional DVAs.