daos
daos copied to clipboard
DAOS-4183 engine: reduce virtual memory and swap footprint
In order to limit the virtual memory and swap footprint, only mmap() the exact requested stack size (to be rounded up to the page size by the Kernel!), different sizes will be managed by a b-tree, and MAP_NORESERVE flag will now be used.
Required-githooks: true
Signed-off-by: Bruno Faccini [email protected]
Before requesting gatekeeper:
- [ ] Two review approvals and any prior change requests have been resolved.
- [ ] Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
- [ ]
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR. - [ ] Commit messages follows the guidelines outlined here.
- [ ] Any tests skipped by the ticket being addressed have been run and passed in the PR.
Gatekeeper:
- [ ] You are the appropriate gatekeeper to be landing the patch.
- [ ] The PR has 2 reviews by people familiar with the code, including appropriate watchers.
- [ ] Githooks were used. If not, request that user install them and check copyright dates.
- [ ] Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
- [ ] All builds have passed. Check non-required builds for any new compiler warnings.
- [ ] Sufficent testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
- [ ] If applicable, the PR has addressed any potential version compatibility issues.
- [ ] Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
- [ ] Extra checks if forced landing is requested
- [ ] Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
- [ ] No new NLT or valgrind warnings. Check the classic view.
- [ ] Quick-build or Quick-functional is not used.
- [ ] Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.
Bug-tracker data: Ticket title is 'io-server segfaults when pmdk built with ndctl' Status is 'In Review' Labels: 'q4_fix,triaged' https://daosio.atlassian.net/browse/DAOS-4183
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/1/execution/node/145/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/1/execution/node/1083/log
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/3/execution/node/146/log
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/4/execution/node/146/log
!!! this PR has permitted runs with "dedup:memcmp" properties to become successful on Frontera , instead to fail with ENOMEM before .....
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/4/execution/node/343/log
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/4/execution/node/299/log
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/4/execution/node/338/log
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/4/execution/node/439/log
Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/4/execution/node/478/log
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/5/execution/node/145/log
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/6/execution/node/146/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/6/execution/node/1084/log
There was only 2 tests in error during last CI session :
Existing failures - 2 Test Hardware / Functional Hardware Medium / 3-./osa/online_extend.py:OSAOnlineExtend.test_osa_online_extend_oclass;run-aggregation-checksum-container-daos_racer-extra_servers-hosts-ior-client_processes-iorflags-job_manager-loop_test-mdtest-wr_size-32K-pool-rebuild-server_config-engines-0-storage-0-1-setup-test_obj_class-test_ranks-dc35 – FTEST_osa.OSAOnlineExtend -->> Time-out already being addressed by DAOS-12054
Test Hardware / Functional Hardware Medium / 1-./scrubber/target_auto_eviction.py:TestWithScrubberTargetEviction.test_scrubber_ssd_auto_eviction;run-agent_config-transport_config-container-dmg-faults-hosts-ior-client_processes-pool-server_config-engines-0-storage-0-1-setup-9bf0 – FTEST_scrubber.TestWithScrubberTargetEviction -->> Time-out already being addressed by DAOS-11950
Is this ok to not rerun CI (and thus not add more in Jenkins job queue...) ??
@johannlombardi @NiuYawei can you review when you have some time ?! thx in advance ;-)
One comment, it looks like this enables the feature by default too. Should that be in the message?
One comment, it looks like this enables the feature by default too. Should that be in the message?
Oops, right @jolivier23 , I forgot that I had enabled it to expose it to the full CI testing... Should I remove the specific change that enables by default (is everybody ok to enable the mmap()ing of ULTs stacks by default ?) ? Or just change the main PR msg to indicate it as you have suggested ?
One comment, it looks like this enables the feature by default too. Should that be in the message?
Oops, right @jolivier23 , I forgot that I had enabled it to expose it to the full CI testing... Should I remove the specific change that enables by default (is everybody ok to enable the mmap()ing of ULTs stacks by default ?) ? Or just change the main PR msg to indicate it as you have suggested ?
I'd be ok with just changing the description but it's probably a question for @johannlombardi whether we should enable the feature by default.
One comment, it looks like this enables the feature by default too. Should that be in the message?
Oops, right @jolivier23 , I forgot that I had enabled it to expose it to the full CI testing... Should I remove the specific change that enables by default (is everybody ok to enable the mmap()ing of ULTs stacks by default ?) ? Or just change the main PR msg to indicate it as you have suggested ?
I'd be ok with just changing the description but it's probably a question for @johannlombardi whether we should enable the feature by default.
@johannlombardi I know you previously reviewed and approved but just wanted to make sure you were ok with the change of default in particular for this feature.
I am ok with the patch and to eventually change it. The issue is that we still got a perf impact on frontera for IOPS benchmarks IIRC. If so, we should address this before enabling it by default.
I am ok with the patch and to eventually change it. The issue is that we still got a perf impact on frontera for IOPS benchmarks IIRC. If so, we should address this before enabling it by default.
Ah, you got new+bad perf numbers from Dalton ? Will push a new commit to remove default enabling...
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/7/execution/node/174/log
Test stage Scan Leap 15.4 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/7/execution/node/890/log
One more review turn-table, sorry guys .... To be honest I don't remember last time when I forgot to ask you reviewing again :-(
Even if mmap()'ed ULTs stacks feature seems to introduce some penalty, I would like to get this PR to land since this feature can be used at least for debugging purpose. I would like to get some feedback and hear what do all my reviewers think about this ??
Even if mmap()'ed ULTs stacks feature seems to introduce some penalty, I would like to get this PR to land since this feature can be used at least for debugging purpose. I would like to get some feedback and hear what do all my reviewers think about this ??
I think if it's disabled by default, or at least for release builds, maybe it's okay to land?
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/10/execution/node/1267/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10808/10/execution/node/1313/log
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-10808/10/display/redirect