zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Optimize zfs diff

Open jwpoduska opened this issue 5 years ago • 15 comments

Motivation and Context

zfs diff was found to be extremely slow to the point of being almost unusable. These changes often improve performance 10 - 100x.

The current implementation of zfs diff uses zfs_obj_to_stats(), which in turn uses zfs_obj_to_path() to get the path to the dnode. The path is then built using zap_value_search() to expand the parents. The problem is zap_value_search() doesn't do a lookup, it does a linear search through the entire directory object, which leads to n^2 behavior.

Description

This changeset adds the ioctl ZFS_IOC_DUMP_ZAP, which is used to enumerate directories and cache them in userland. The zfs diff stream has been modified to always return a stat object for every id being used since the libzfs_diff code would always do this anyway. A new flag was added to zfs_obj_to_stats() to prevent looking up the path the old way.

These changes are intended to produce identical results to the current diff and thus do not fix any of the problems with the current version (e.g., any of the issues with hard links).

As an extreme example, I ran a large diff on a VM with the following results:

root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
193063
real    212m49.366s
user    0m2.017s
sys     212m44.075s

and then with the changes:

root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
193063
real    0m18.139s
user    0m2.008s
sys     0m24.112s

How Has This Been Tested?

The test suite, which includes zfs diff tests was successfully run. In addition, many long running diffs were created to compare both speed and correctness against the current version.

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Performance enhancement (non-breaking change which improves efficiency)
  • [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the ZFS on Linux code style requirements.
  • [ ] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [x] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

jwpoduska avatar Jun 01 '20 15:06 jwpoduska

Codecov Report

Merging #10391 into master will increase coverage by 0.05%. The diff coverage is 81.20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10391      +/-   ##
==========================================
+ Coverage   79.43%   79.48%   +0.05%     
==========================================
  Files         393      393              
  Lines      124664   124834     +170     
==========================================
+ Hits        99021    99222     +201     
+ Misses      25643    25612      -31     
Flag Coverage Δ
#kernel 80.14% <88.39%> (-0.03%) :arrow_down:
#user 64.82% <60.50%> (+0.28%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
lib/libzfs/libzfs_dataset.c 76.95% <ø> (ø)
lib/libzfs/os/linux/libzfs_mount_os.c 68.86% <ø> (ø)
lib/libzfs/os/linux/libzfs_util_os.c 54.54% <ø> (-2.38%) :arrow_down:
lib/libzpool/kernel.c 64.23% <0.00%> (-0.30%) :arrow_down:
module/zfs/dsl_dataset.c 92.52% <50.00%> (+0.34%) :arrow_up:
module/os/linux/zfs/zfs_znode.c 85.68% <73.91%> (-0.51%) :arrow_down:
lib/libzfs/libzfs_diff.c 74.18% <77.34%> (+5.12%) :arrow_up:
module/zfs/zfs_ioctl.c 85.95% <84.00%> (-0.53%) :arrow_down:
module/zfs/dmu_diff.c 90.00% <92.94%> (+2.12%) :arrow_up:
lib/libzfs_core/libzfs_core.c 85.32% <100.00%> (+0.58%) :arrow_up:
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6774931...9ae8ab3. Read the comment docs.

codecov[bot] avatar Jun 01 '20 22:06 codecov[bot]

Somewhat adjacent to this PR, is there some way to speed up the zap_value_search() implementation we have? OsX has to look up NAME from vget(id), we initially did it with zap_value_search() but it was too slow, we had to change it to cache names in znodes instead. Having a fast name look up in place of zap_value_search would help both issues.

lundman avatar Jun 02 '20 09:06 lundman

@lundman it would be nice to have an optimized zap_value_search(), There are issues of course, like id -> path is 1 to many (b/c of links). It also has issues with orphaning, e.g., hard link to an object and then deleting the directory can leave the PARENT entry pointing to a removed objid. This fix is really targeted on improving the performance of zfs diff with its existing warts.

jwpoduska avatar Jun 09 '20 12:06 jwpoduska

zfs diff was found to be extremely slow to the point of being almost unusable. These changes often improve performance 10 - 100x.

My understanding is that we would expect these changes to improve performance only when there are "large" directories. With more typical size directories, zfs diff should not be extremely slow to begin with. How many directory entries are needed to experience the problem and get an improvement from this PR?

ahrens avatar Jun 11 '20 02:06 ahrens

@ahrens Thanks for looking...

There are a lot of factors including the number of changes that occurred (each one requires a path lookup), the depth of directories (each level requires a lookup) and the size of the zap for each directory. Fragmentation plays a role as well.

On a new pool, the current version of zfs diff can run very fast, but on a mature pool even small directories can take a while. For example, I reran the test from above, but limited the changes to a single directory with 14 files in it. I snapped, deleted the 14 files and timed the result:

root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
15
real    0m12.525s
user    0m0.009s
sys     0m12.405s

vs

root@ub41:~# time zfs diff dattoArray/xxxData/123456@snap0 @snap1 | wc -l
15
real    0m0.745s
user    0m0.377s
sys     0m0.794s

jwpoduska avatar Jun 12 '20 11:06 jwpoduska

@ahrens I moved the ioctl's to use the nvlist versions, including the diff ioctl, which has the effect of versioning diff as well. So, a mismatched userland/kernel diff will fail, but will not blow up beyond that. I also reverted zfs_stat_t to prevent zfs_cmd_t from changing size.

jwpoduska avatar Jun 25 '20 15:06 jwpoduska

is this change going to be merged at some point?

ColinIanKing avatar Jul 09 '20 08:07 ColinIanKing

@ColinIanKing I expect some version of this will be merged after a few rounds of feedback.

behlendorf avatar Jul 10 '20 02:07 behlendorf

Apologies for asking again; is this change going to be merged at some point?

ColinIanKing avatar Sep 02 '20 12:09 ColinIanKing

I've resolved contlicts and created patch for this PR, its available here. The patch already included into experimental build Debian package using latest master branch, passed ZFS test suite, you can try it out: https://github.com/terem42/zfs-debian Instruction on Debian repo usage are inside readme.md

terem42 avatar Nov 04 '20 17:11 terem42

The results looks promising. @behlendorf @ahrens are there any open changes to do? I talked to @jwpoduska and they are using this at datto for a while now... Would be great to get faster zfs diff.

Anyway much thanks!

jumbi77 avatar Apr 15 '21 16:04 jumbi77

This would be nice to see - I just added the desire to get something like this to the Hackathon project ideas not having seen this PR (or if I did, having forgot).

If @jwpoduska has no {time, interest} in rebasing this and resolving any outstanding compat issues, I may grab @terem42's rebase and do an updated PR...

rincebrain avatar Oct 30 '21 17:10 rincebrain

@behlendorf JFYI: Considering the number of open PRs i guess this PR can be closed in favor of mentioned #12837 which its the successor of this PR, right?

jumbi77 avatar Aug 05 '22 18:08 jumbi77

Is there a reason why this PR is not followed up upon? I did not realize how slow zfs diffwas until I tried it on a slow hard disk raidz pool. I actually aborted it after it ran for more than a day. Calling zfs diff unusable in many cases seems no understatement.

riedel avatar Feb 05 '24 22:02 riedel

If you want to pick up #12837, go for it, I just gave up because it got ignored.

rincebrain avatar Feb 05 '24 22:02 rincebrain