Dispersed volume becomes near unresponsive if brick replaced and healing is ongoing with performance.iot-pass-through enabled
Description of problem:
Gluster 11.1 Dispersed 4+1 over 200 bricks over 20 nodes
I've run into a rather odd scenario. I have a dispersed cluster with global threading enabled, and performance.iot-pass-through enabled as recommended in the documentation. I had a drive/brick go bad, so I followed the reset-brick process to replace it and allow it to rebuild.
After initially replacing the brick, all the san nodes CPU and throughput dropped exponentially, even on nodes that hosted subvolumes unrelated to the one with the replaced/healing drive, and the client mounts became pretty much unresponsive. All but 2 clients are mounted read-only. So as a test, I blocked the san performing the drive rebuild from all the read-only clients, but did not block the two clients with rw access, and everything became responsive again. Upon further troubleshooting, I disabled performance.iot-pass-through and unblocked the san from the read-only clients, and things remained responsive. To test the theory, I re-enabled the iot-pass-through setting, and everything became unresponsive again. Disabled it, and everything came back to life.
It seems odd that 1 of 200 bricks being replaced would have such a significant impact on performance with this setting enabled.
The exact command to reproduce the issue: Enable global threading Enable performance.iot-pass-through replace a brick and reset it See performance degredation Disable performance.iot-pass-through See performance improvement
Hosts are ubuntu 22.04 Gluster 11.1
Options Reconfigured: disperse.shd-max-threads: 4 disperse.background-heals: 4 performance.read-ahead-page-count: 4 config.brick-threads: 16 config.client-threads: 16 cluster.rmdir-optimize: off performance.readdir-ahead-pass-through: off dht.force-readdirp: true disperse.eager-lock: on performance.least-prio-threads: 2 server.event-threads: 8 client.event-threads: 8 server.outstanding-rpc-limit: 128 performance.md-cache-timeout: 600 performance.md-cache-statfs: on performance.iot-cleanup-disconnected-reqs: off cluster.background-self-heal-count: 4 performance.read-ahead-pass-through: disable performance.write-behind-pass-through: disable performance.open-behind-pass-through: disable performance.nl-cache-pass-through: disable performance.io-cache-pass-through: disable performance.md-cache-pass-through: disable performance.quick-read-cache-size: 256MB cluster.rebal-throttle: aggressive features.scrub-freq: monthly features.scrub-throttle: normal features.scrub: Active features.bitrot: on performance.quick-read: off performance.open-behind: on performance.write-behind: on performance.io-cache: on performance.write-behind-window-size: 128MB performance.rda-cache-limit: 1GB performance.cache-max-file-size: 7GB performance.cache-size: 8GB performance.nl-cache-timeout: 600 performance.nl-cache: off performance.parallel-readdir: enable performance.cache-invalidation: true features.cache-invalidation-timeout: 600 features.cache-invalidation: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.client-io-threads: on cluster.lookup-optimize: on performance.flush-behind: on performance.read-ahead: off cluster.lookup-unhashed: off cluster.weighted-rebalance: off performance.readdir-ahead: on cluster.readdir-optimize: off cluster.min-free-disk: 5% network.compression.mem-level: -1 network.compression: off storage.build-pgfid: on config.global-threading: on performance.iot-pass-through: disable cluster.force-migration: disable cluster.disperse-self-heal-daemon: enable performance.cache-refresh-timeout: 60 performance.enable-least-priority: on locks.trace: off storage.linux-io_uring: on server.tcp-user-timeout: 60 performance.stat-prefetch: off performance.xattr-cache-list: * performance.cache-capability-xattrs: on performance.quick-read-cache-invalidation: on performance.cache-samba-metadata: on server.manage-gids: off performance.nl-cache-positive-entry: disable client.send-gids: off features.acl: off disperse.read-policy: gfid-hash disperse.stripe-cache: 10 cluster.brick-multiplex: off cluster.enable-shared-storage: disable