zfs
zfs copied to clipboard
Optimize zfs diff
Motivation and Context
zfs diff was found to be extremely slow to the point of being almost unusable. These changes often improve performance 10 - 100x.
The current implementation of zfs diff uses zfs_obj_to_stats(), which in turn uses zfs_obj_to_path() to get the path to the dnode. The path is then built using zap_value_search() to expand the parents. The problem is zap_value_search() doesn't do a lookup, it does a linear search through the entire directory object, which leads to n^2 behavior.
Description
This changeset adds the ioctl ZFS_IOC_DUMP_ZAP, which is used to enumerate directories and cache them in userland. The zfs diff stream has been modified to always return a stat object for every id being used since the libzfs_diff code would always do this anyway. A new flag was added to zfs_obj_to_stats() to prevent looking up the path the old way.
These changes are intended to produce identical results to the current diff and thus do not fix any of the problems with the current version (e.g., any of the issues with hard links).
As an extreme example, I ran a large diff on a VM with the following results:
root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
193063
real 212m49.366s
user 0m2.017s
sys 212m44.075s
and then with the changes:
root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
193063
real 0m18.139s
user 0m2.008s
sys 0m24.112s
How Has This Been Tested?
The test suite, which includes zfs diff tests was successfully run. In addition, many long running diffs were created to compare both speed and correctness against the current version.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [x] My code follows the ZFS on Linux code style requirements.
- [ ] I have updated the documentation accordingly.
- [x] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [x] I have run the ZFS Test Suite with this change applied.
- [x] All commit messages are properly formatted and contain
Signed-off-by.
Codecov Report
Merging #10391 into master will increase coverage by
0.05%. The diff coverage is81.20%.
@@ Coverage Diff @@
## master #10391 +/- ##
==========================================
+ Coverage 79.43% 79.48% +0.05%
==========================================
Files 393 393
Lines 124664 124834 +170
==========================================
+ Hits 99021 99222 +201
+ Misses 25643 25612 -31
| Flag | Coverage Δ | |
|---|---|---|
| #kernel | 80.14% <88.39%> (-0.03%) |
:arrow_down: |
| #user | 64.82% <60.50%> (+0.28%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| lib/libzfs/libzfs_dataset.c | 76.95% <ø> (ø) |
|
| lib/libzfs/os/linux/libzfs_mount_os.c | 68.86% <ø> (ø) |
|
| lib/libzfs/os/linux/libzfs_util_os.c | 54.54% <ø> (-2.38%) |
:arrow_down: |
| lib/libzpool/kernel.c | 64.23% <0.00%> (-0.30%) |
:arrow_down: |
| module/zfs/dsl_dataset.c | 92.52% <50.00%> (+0.34%) |
:arrow_up: |
| module/os/linux/zfs/zfs_znode.c | 85.68% <73.91%> (-0.51%) |
:arrow_down: |
| lib/libzfs/libzfs_diff.c | 74.18% <77.34%> (+5.12%) |
:arrow_up: |
| module/zfs/zfs_ioctl.c | 85.95% <84.00%> (-0.53%) |
:arrow_down: |
| module/zfs/dmu_diff.c | 90.00% <92.94%> (+2.12%) |
:arrow_up: |
| lib/libzfs_core/libzfs_core.c | 85.32% <100.00%> (+0.58%) |
:arrow_up: |
| ... and 56 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 6774931...9ae8ab3. Read the comment docs.
Somewhat adjacent to this PR, is there some way to speed up the zap_value_search() implementation we have? OsX has to look up NAME from vget(id), we initially did it with zap_value_search() but it was too slow, we had to change it to cache names in znodes instead. Having a fast name look up in place of zap_value_search would help both issues.
@lundman it would be nice to have an optimized zap_value_search(), There are issues of course, like id -> path is 1 to many (b/c of links). It also has issues with orphaning, e.g., hard link to an object and then deleting the directory can leave the PARENT entry pointing to a removed objid. This fix is really targeted on improving the performance of zfs diff with its existing warts.
zfs diff was found to be extremely slow to the point of being almost unusable. These changes often improve performance 10 - 100x.
My understanding is that we would expect these changes to improve performance only when there are "large" directories. With more typical size directories, zfs diff should not be extremely slow to begin with. How many directory entries are needed to experience the problem and get an improvement from this PR?
@ahrens Thanks for looking...
There are a lot of factors including the number of changes that occurred (each one requires a path lookup), the depth of directories (each level requires a lookup) and the size of the zap for each directory. Fragmentation plays a role as well.
On a new pool, the current version of zfs diff can run very fast, but on a mature pool even small directories can take a while. For example, I reran the test from above, but limited the changes to a single directory with 14 files in it. I snapped, deleted the 14 files and timed the result:
root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
15
real 0m12.525s
user 0m0.009s
sys 0m12.405s
vs
root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
15
real 0m0.745s
user 0m0.377s
sys 0m0.794s
@ahrens I moved the ioctl's to use the nvlist versions, including the diff ioctl, which has the effect of versioning diff as well. So, a mismatched userland/kernel diff will fail, but will not blow up beyond that. I also reverted zfs_stat_t to prevent zfs_cmd_t from changing size.
is this change going to be merged at some point?
@ColinIanKing I expect some version of this will be merged after a few rounds of feedback.
Apologies for asking again; is this change going to be merged at some point?
I've resolved contlicts and created patch for this PR, its available here. The patch already included into experimental build Debian package using latest master branch, passed ZFS test suite, you can try it out: https://github.com/terem42/zfs-debian Instruction on Debian repo usage are inside readme.md
The results looks promising. @behlendorf @ahrens are there any open changes to do? I talked to @jwpoduska and they are using this at datto for a while now... Would be great to get faster zfs diff.
Anyway much thanks!
This would be nice to see - I just added the desire to get something like this to the Hackathon project ideas not having seen this PR (or if I did, having forgot).
If @jwpoduska has no {time, interest} in rebasing this and resolving any outstanding compat issues, I may grab @terem42's rebase and do an updated PR...
@behlendorf JFYI: Considering the number of open PRs i guess this PR can be closed in favor of mentioned #12837 which its the successor of this PR, right?
Is there a reason why this PR is not followed up upon? I did not realize how slow zfs diffwas until I tried it on a slow hard disk raidz pool. I actually aborted it after it ran for more than a day. Calling zfs diff unusable in many cases seems no understatement.
If you want to pick up #12837, go for it, I just gave up because it got ignored.