middleware icon indicating copy to clipboard operation
middleware copied to clipboard

NAS-130619 / 25.10 / Use truenas_pylibzfs in pool.dataset.query

Open yocalebo opened this issue 6 months ago • 2 comments

This replaces zfs.dataset.query in favor of truenas_pylibzfs (new C module). This was done in a way that would allow us to "drop-in" the new use of this library without having to make any major API changes. We will write a new query endpoint that is vastly simpler, even more efficient than this and much more ergonomic to use but this is 95% of the problem that we currently have with our process pool. This is, probably, the most called API endpoint internally and externally so the performance gains can't be understated. I did a very synthetic comparison between the 2x endpoints and the results are pretty substantial.

Old API Performance:

  • Times: [0.228, 0.229, 0.216, 0.224, 0.211] seconds
  • Average: 0.222 seconds
  • Range: 0.211 - 0.229 seconds (0.018s spread)
  • Standard Deviation: ~0.007 seconds

New API Performance:

  • Times: [0.063, 0.062, 0.062, 0.062, 0.062] seconds
  • Average: 0.062 seconds
  • Range: 0.062 - 0.063 seconds (0.001s spread)
  • Standard Deviation: ~0.0005 seconds

Speed Improvement:

  • 3.58x faster (0.222s → 0.062s)
  • 72.7% reduction in response time

Consistency Improvement:

  • Much more consistent response times (±0.0005s vs ±0.007s)
  • 14x more stable performance

The performance characteristics are also confirmed by the full CI test run. Usually the full suite runs in ~3ish hours. The run with these changes took ~2ish hours (a little over).

Finally, there are very minor differences with the old and the new implementation that should be noted. I actually consider them "cosmetic improvements".

  1. Size/Storage Formatting Changes: - Old API: Uses shorter format like '7.16G', '140K', '0B' - New API: Uses more explicit format like '7.16 GiB', '140 KiB', '0 bytes'
  2. Affected Fields: - available: '7.16G' → '7.16 GiB' - used: '2.04G' → '2.04 GiB' - usedbychildren: '2.04G' → '2.04 GiB' - usedbydataset: '140K' → '140 KiB' - usedbysnapshots: '0B' → '0 bytes' - usedbyrefreservation: '0B' → '0 bytes'

yocalebo avatar Jun 13 '25 18:06 yocalebo

Jira URL: https://ixsystems.atlassian.net/browse/NAS-130619

bugclerk avatar Jun 13 '25 18:06 bugclerk

This could make a large performance improvement to the Incus Storage Driver.

After the iscsi create/delete, pool.dataset.query is the next biggest bottleneck.

mrstux avatar Jun 20 '25 04:06 mrstux

This PR has been merged and conversations have been locked. If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

bugclerk avatar Jun 23 '25 16:06 bugclerk