DAOS-17338 object: optimize sgl handling to reduce iov buffer count
This update modifies obj_sgls_dup and obj_dup_sgls_free to merge small or fragmented IOV buffers, reducing the scatter-gather list (SGL) IOV count and mitigating resource exhaustion during network bulk transfers.
-
IOV Buffer Merging Logic: Buffers smaller than 64 bytes are now merged into a larger contiguous buffer. Sequential buffers exceeding 512 entries (each ≤512KB) are consolidated into a single buffer to minimize IOV count. Original data is copied into the merged buffer, preserving content integrity.
-
Memory Optimization: No additional memory allocation occurs if no merging is required. For write operations, merged cases incur one memory allocation and copy. For fetch operations, two copies are performed to ensure unread regions remain unmodified (critical for CI/test validation).
Reduces SGL memory fragmentation and IOV buffer count, improving bulk transfer efficiency. Addresses edge cases where excessive IOV entries could exhaust network-layer resources.
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Ticket title is 'Copy data buffer for unfriendly I/O (too fragmented, too small fragment)' Status is 'In Review' Job should run at elevated priority (1) https://daosio.atlassian.net/browse/DAOS-17338
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/1/execution/node/267/log
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/1/execution/node/302/log
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/1/execution/node/266/log
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-16186/2/testReport/
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/2/execution/node/1397/log
ping reviewers..
I am a little confused by this:
IOV Buffer Merging Logic: Buffers smaller than 64 bytes are now merged into a larger contiguous buffer. Sequential buffers exceeding 512 entries (each ≤512KB) are consolidated into a single buffer to minimize IOV count. Original data is copied into the merged buffer, preserving content integrity.
You say sequential buffers exceeding 512 entries (i think you mean here sgl count not bytes) are consolidated. im curious why in that case? what is wrong with consolodation a lower count of entries into a single buffer? that is actually essential..
512KB for size sounds also puzzling to me but maybe i misunderstood.. do you mean each combined SGL buffer should be less than 512K? what if the combined buffer is bigger? do you abandon the merge?
ou say sequential buffers exceeding 512 entries (i think you mean here sgl count not bytes) are consolidated. im curious why in that case? what is wrong with consolodation
Yes, currently it checked both number of iov buffer count and each iov buffer size to trigger iov buffer merging conditions. since merging also introduces extra memory overhead as well, in the first version I want to be cautions on this.
512KB for size sounds also puzzling to me but maybe i misunderstood.. do you mean each combined SGL buffer should be less than 512K? what if the combined buffer is bigger? do you abandon the merge?
NO, 512KB just means this small fragment should be considered to be merged, currently one max combined buffer is 16MiB.
this PR is failing NLT because it causes a server segfault there:
Core was generated by `/home/chaarawi/install/daos/bin/daos_engine -t 4 -x 2 -f 0 -g daos_server -d /t'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247
247 setbit(bitmap, index);
[Current thread is 1 (Thread 0x7f0ecd26c640 (LWP 2282148))]
Missing separate debuginfos, use: dnf debuginfo-install daxctl-libs-71.1-8.el9.x86_64 glibc-2.34-60.el9_2.7.x86_64 hwloc-libs-2.4.1-5.el9.x86_64 kmod-libs-28-7.el9.x86_64 libaio-0.3.111-13.el9.x86_64 libgcc-11.3.1-4.3.el9.x86_64 libibverbs-44.0-2.el9.x86_64 libnl3-3.7.0-1.el9.x86_64 libunwind-1.6.2-1.el9.x86_64 libuuid-2.37.4-11.el9_2.x86_64 libyaml-0.2.5-7.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 ndctl-libs-71.1-8.el9.x86_64 numactl-libs-2.0.14-9.el9.x86_64 systemd-libs-252-14.el9_2.3.0.1.x86_64 xz-libs-5.2.5-8.el9_0.x86_64
(gdb) bt
#0 0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247
#1 setbits64 (bits=0, at=0, bmap=0x7f0ec2316ed0) at src/include/daos/common.h:253
#2 sgls_set_merged_bitmap (ctx=ctx@entry=0x7f0dba3ee630, sg=sg@entry=0x7f0ec1dc3bc8, frag_start=frag_start@entry=0, frag_chain=frag_chain@entry=0, i=i@entry=0, j=0, bitmap_nr=1, update=true, dup=0x7f0dba3ee62f)
at src/object/cli_obj.c:4627
#3 0x000055bb2c70dc97 in obj_sgls_dup (obj_auxi=0x7f0ec1ed4f38, args=args@entry=0x7f0ec1ed4ea8, update=update@entry=true) at src/object/cli_obj.c:4763
#4 0x000055bb2c726bb5 in dc_obj_update (task=task@entry=0x7f0ec1ed4e10, epoch=epoch@entry=0x7f0dba3ee720, map_ver=1, args=args@entry=0x7f0ec1ed4ea8, obj=0x7f0ec1f1a210) at src/object/cli_obj.c:6172
#5 0x000055bb2c727daa in dc_obj_update_task (task=0x7f0ec1ed4e10) at src/object/cli_obj.c:6281
#6 0x00007f0f03ee596d in tse_task_schedule_with_delay (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true, delay=delay@entry=0) at src/common/tse.c:1047
#7 0x00007f0f03ee5bc2 in tse_task_schedule (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true) at src/common/tse.c:1056
#8 0x000055bb2c6088aa in dsc_task_run (task=0x7f0ec1ed4e10, retry_cb=retry_cb@entry=0x0, arg=arg@entry=0x7f0dba3ee858, arg_size=arg_size@entry=8, sync=sync@entry=true) at src/engine/srv_cli.c:112
#9 0x00007f0eff39f96e in dsc_obj_update (oh=..., flags=flags@entry=0, dkey=dkey@entry=0x7f0ec1dc3b68, nr=1, iods=iods@entry=0x7f0ec1dc43c8, sgls=sgls@entry=0x7f0ec1dc3bc8) at src/object/srv_cli.c:131
#10 0x00007f0eff459138 in cont_send_oit_bucket (oa=oa@entry=0x7f0ec1dc3b50, bucket_id=<optimized out>, bucket_id@entry=366) at src/container/srv_oi_table.c:82
#11 0x00007f0eff45a221 in cont_child_gather_oids (coc=0x7f0ec1d71f00, coh_uuid=coh_uuid@entry=0x7f0dba3e8ba0 "\331>\340\375\016\225M\026\275\060\221zf\211", <incomplete sequence \340>, epoch=2223939632600186880,
oit_oid=...) at src/container/srv_oi_table.c:209
#12 0x00007f0eff43e673 in cont_snap_notify_one (vin=0x7f0dba3e8b80) at src/container/srv_target.c:2186
#13 0x000055bb2c6305ad in collective_func (varg=0x7f0ef20e2dd0) at src/engine/ult.c:56
#14 0x00007f0f03a0944c in ABTD_ythread_func_wrapper (p_arg=0x7f0dba3eede0) at arch/abtd_ythread.c:12
#15 0x00007f0f039f7d29 in ABTD_ythread_context_func_wrapper (p_fctx=<optimized out>) at ../src/include/abtd_fcontext.h:74
#16 0x0000000000000000 in ?? ()
this PR is failing NLT because it causes a server segfault there:
Core was generated by `/home/chaarawi/install/daos/bin/daos_engine -t 4 -x 2 -f 0 -g daos_server -d /t'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247 247 setbit(bitmap, index); [Current thread is 1 (Thread 0x7f0ecd26c640 (LWP 2282148))] Missing separate debuginfos, use: dnf debuginfo-install daxctl-libs-71.1-8.el9.x86_64 glibc-2.34-60.el9_2.7.x86_64 hwloc-libs-2.4.1-5.el9.x86_64 kmod-libs-28-7.el9.x86_64 libaio-0.3.111-13.el9.x86_64 libgcc-11.3.1-4.3.el9.x86_64 libibverbs-44.0-2.el9.x86_64 libnl3-3.7.0-1.el9.x86_64 libunwind-1.6.2-1.el9.x86_64 libuuid-2.37.4-11.el9_2.x86_64 libyaml-0.2.5-7.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 ndctl-libs-71.1-8.el9.x86_64 numactl-libs-2.0.14-9.el9.x86_64 systemd-libs-252-14.el9_2.3.0.1.x86_64 xz-libs-5.2.5-8.el9_0.x86_64 (gdb) bt #0 0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247 #1 setbits64 (bits=0, at=0, bmap=0x7f0ec2316ed0) at src/include/daos/common.h:253 #2 sgls_set_merged_bitmap (ctx=ctx@entry=0x7f0dba3ee630, sg=sg@entry=0x7f0ec1dc3bc8, frag_start=frag_start@entry=0, frag_chain=frag_chain@entry=0, i=i@entry=0, j=0, bitmap_nr=1, update=true, dup=0x7f0dba3ee62f) at src/object/cli_obj.c:4627 #3 0x000055bb2c70dc97 in obj_sgls_dup (obj_auxi=0x7f0ec1ed4f38, args=args@entry=0x7f0ec1ed4ea8, update=update@entry=true) at src/object/cli_obj.c:4763 #4 0x000055bb2c726bb5 in dc_obj_update (task=task@entry=0x7f0ec1ed4e10, epoch=epoch@entry=0x7f0dba3ee720, map_ver=1, args=args@entry=0x7f0ec1ed4ea8, obj=0x7f0ec1f1a210) at src/object/cli_obj.c:6172 #5 0x000055bb2c727daa in dc_obj_update_task (task=0x7f0ec1ed4e10) at src/object/cli_obj.c:6281 #6 0x00007f0f03ee596d in tse_task_schedule_with_delay (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true, delay=delay@entry=0) at src/common/tse.c:1047 #7 0x00007f0f03ee5bc2 in tse_task_schedule (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true) at src/common/tse.c:1056 #8 0x000055bb2c6088aa in dsc_task_run (task=0x7f0ec1ed4e10, retry_cb=retry_cb@entry=0x0, arg=arg@entry=0x7f0dba3ee858, arg_size=arg_size@entry=8, sync=sync@entry=true) at src/engine/srv_cli.c:112 #9 0x00007f0eff39f96e in dsc_obj_update (oh=..., flags=flags@entry=0, dkey=dkey@entry=0x7f0ec1dc3b68, nr=1, iods=iods@entry=0x7f0ec1dc43c8, sgls=sgls@entry=0x7f0ec1dc3bc8) at src/object/srv_cli.c:131 #10 0x00007f0eff459138 in cont_send_oit_bucket (oa=oa@entry=0x7f0ec1dc3b50, bucket_id=<optimized out>, bucket_id@entry=366) at src/container/srv_oi_table.c:82 #11 0x00007f0eff45a221 in cont_child_gather_oids (coc=0x7f0ec1d71f00, coh_uuid=coh_uuid@entry=0x7f0dba3e8ba0 "\331>\340\375\016\225M\026\275\060\221zf\211", <incomplete sequence \340>, epoch=2223939632600186880, oit_oid=...) at src/container/srv_oi_table.c:209 #12 0x00007f0eff43e673 in cont_snap_notify_one (vin=0x7f0dba3e8b80) at src/container/srv_target.c:2186 #13 0x000055bb2c6305ad in collective_func (varg=0x7f0ef20e2dd0) at src/engine/ult.c:56 #14 0x00007f0f03a0944c in ABTD_ythread_func_wrapper (p_arg=0x7f0dba3eede0) at arch/abtd_ythread.c:12 #15 0x00007f0f039f7d29 in ABTD_ythread_context_func_wrapper (p_fctx=<optimized out>) at ../src/include/abtd_fcontext.h:74 #16 0x0000000000000000 in ?? ()
Thanks, i missed to init environment in the server case.
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/8/display/redirect
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/9/display/redirect
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16186/9/testReport/
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/10/display/redirect
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/10/execution/node/1299/log
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/13/execution/node/1423/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/14/execution/node/1509/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/15/execution/node/441/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/16/display/redirect
can't see the results, but it passed 2.6 tests (except linting error), and also passed here (https://github.com/daos-stack/daos/pull/16566), so I'm landing it.