DAOS-15751 bio: change default bio_max_async_sz to 32k (#14568)
In md-on-ssd mode, when the update data size <= bio_max_async_sz, the local tx will execute in the async data io mode: "submit both data NVMe I/O and WAL NVMe I/O, then wait for both data I/O and WAL I/O completion", the purpose of this async data I/O mode is to reduce the update latency (by one WAL I/O latency comparing with the old sync mode).
However, tests showed that this async data I/O mode has slightly worse throughput comparing with the legacy sync I/O mode when the data size and QD are large (io size > 32k, QD = 64 or 128).
The reason could be that async I/O mode would more likely generate mixed I/O to SSD (when the WAL & data blob are sharing the same SSD, which is typical configuration), and that might cause more severe I/O reordering on SSD and that would badly impact throughput at the end.
NVMe I/O submit pattern is depicted as follwing: async mode: tx1 data, tx1 wal, tx2 data, tx2 wal, ... aync mode: tx1 data, tx2 data, ... tx1 wal, tx2 wal, ...
Given that async data I/O mode is only helpful on reducing latency, we choose to shrink the default bio_max_async_sz from 1MB to 32k to make it still benefit latency sensitive app with small I/Os, yet not impacting bandwith intensive apps with large I/Os
Before requesting gatekeeper:
- [ ] Two review approvals and any prior change requests have been resolved.
- [ ] Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
- [ ]
Features:(orTest-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR. - [ ] Commit messages follows the guidelines outlined here.
- [ ] Any tests skipped by the ticket being addressed have been run and passed in the PR.
Gatekeeper:
- [ ] You are the appropriate gatekeeper to be landing the patch.
- [ ] The PR has 2 reviews by people familiar with the code, including appropriate owners.
- [ ] Githooks were used. If not, request that user install them and check copyright dates.
- [ ] Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
- [ ] All builds have passed. Check non-required builds for any new compiler warnings.
- [ ] Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
- [ ] If applicable, the PR has addressed any potential version compatibility issues.
- [ ] Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
- [ ] Extra checks if forced landing is requested
- [ ] Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
- [ ] No new NLT or valgrind warnings. Check the classic view.
- [ ] Quick-build or Quick-functional is not used.
- [ ] Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.
Ticket title is 'MD-on-SSD IOR Performance low with 64K transfer size' Status is 'In Progress' Labels: '2.6.0,md_on_ssd,scrubbed_2.8' https://daosio.atlassian.net/browse/DAOS-15751
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14622/2/testReport/
@daos-stack/daos-gatekeeper , CI failed for known issues, let's force land this single parameter change PR.
@daos-stack/daos-gatekeeper , CI failed for known issues, let's force land this single parameter change PR.
Since NLT failed, none of the functional test stages ran.