daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-17338 object: optimize sgl handling to reduce iov buffer count

Open wangshilong opened this issue 9 months ago • 16 comments

This update modifies obj_sgls_dup and obj_dup_sgls_free to merge small or fragmented IOV buffers, reducing the scatter-gather list (SGL) IOV count and mitigating resource exhaustion during network bulk transfers.

  1. IOV Buffer Merging Logic: Buffers smaller than 64 bytes are now merged into a larger contiguous buffer. Sequential buffers exceeding 512 entries (each ≤512KB) are consolidated into a single buffer to minimize IOV count. Original data is copied into the merged buffer, preserving content integrity.

  2. Memory Optimization: No additional memory allocation occurs if no merging is required. For write operations, merged cases incur one memory allocation and copy. For fetch operations, two copies are performed to ensure unread regions remain unmodified (critical for CI/test validation).

Reduces SGL memory fragmentation and IOV buffer count, improving bulk transfer efficiency. Addresses edge cases where excessive IOV entries could exhaust network-layer resources.

Steps for the author:

  • [ ] Commit message follows the guidelines.
  • [ ] Appropriate Features or Test-tag pragmas were used.
  • [ ] Appropriate Functional Test Stages were run.
  • [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).

wangshilong avatar Apr 02 '25 08:04 wangshilong

Ticket title is 'Copy data buffer for unfriendly I/O (too fragmented, too small fragment)' Status is 'In Review' Job should run at elevated priority (1) https://daosio.atlassian.net/browse/DAOS-17338

github-actions[bot] avatar Apr 02 '25 08:04 github-actions[bot]

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/1/execution/node/267/log

daosbuild1 avatar Apr 02 '25 19:04 daosbuild1

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/1/execution/node/302/log

daosbuild1 avatar Apr 02 '25 20:04 daosbuild1

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/1/execution/node/266/log

daosbuild1 avatar Apr 02 '25 23:04 daosbuild1

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-16186/2/testReport/

daosbuild1 avatar Apr 06 '25 22:04 daosbuild1

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16186/2/execution/node/1397/log

daosbuild1 avatar Apr 07 '25 02:04 daosbuild1

ping reviewers..

wangshilong avatar Apr 10 '25 14:04 wangshilong

I am a little confused by this:

IOV Buffer Merging Logic: Buffers smaller than 64 bytes are now merged into a larger contiguous buffer. Sequential buffers exceeding 512 entries (each ≤512KB) are consolidated into a single buffer to minimize IOV count. Original data is copied into the merged buffer, preserving content integrity.

You say sequential buffers exceeding 512 entries (i think you mean here sgl count not bytes) are consolidated. im curious why in that case? what is wrong with consolodation a lower count of entries into a single buffer? that is actually essential..

512KB for size sounds also puzzling to me but maybe i misunderstood.. do you mean each combined SGL buffer should be less than 512K? what if the combined buffer is bigger? do you abandon the merge?

mchaarawi avatar Apr 22 '25 15:04 mchaarawi

ou say sequential buffers exceeding 512 entries (i think you mean here sgl count not bytes) are consolidated. im curious why in that case? what is wrong with consolodation

Yes, currently it checked both number of iov buffer count and each iov buffer size to trigger iov buffer merging conditions. since merging also introduces extra memory overhead as well, in the first version I want to be cautions on this.

512KB for size sounds also puzzling to me but maybe i misunderstood.. do you mean each combined SGL buffer should be less than 512K? what if the combined buffer is bigger? do you abandon the merge?

NO, 512KB just means this small fragment should be considered to be merged, currently one max combined buffer is 16MiB.

wangshilong avatar Apr 23 '25 01:04 wangshilong

this PR is failing NLT because it causes a server segfault there:

Core was generated by `/home/chaarawi/install/daos/bin/daos_engine -t 4 -x 2 -f 0 -g daos_server -d /t'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247
247			setbit(bitmap, index);
[Current thread is 1 (Thread 0x7f0ecd26c640 (LWP 2282148))]
Missing separate debuginfos, use: dnf debuginfo-install daxctl-libs-71.1-8.el9.x86_64 glibc-2.34-60.el9_2.7.x86_64 hwloc-libs-2.4.1-5.el9.x86_64 kmod-libs-28-7.el9.x86_64 libaio-0.3.111-13.el9.x86_64 libgcc-11.3.1-4.3.el9.x86_64 libibverbs-44.0-2.el9.x86_64 libnl3-3.7.0-1.el9.x86_64 libunwind-1.6.2-1.el9.x86_64 libuuid-2.37.4-11.el9_2.x86_64 libyaml-0.2.5-7.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 ndctl-libs-71.1-8.el9.x86_64 numactl-libs-2.0.14-9.el9.x86_64 systemd-libs-252-14.el9_2.3.0.1.x86_64 xz-libs-5.2.5-8.el9_0.x86_64
(gdb) bt
#0  0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247
#1  setbits64 (bits=0, at=0, bmap=0x7f0ec2316ed0) at src/include/daos/common.h:253
#2  sgls_set_merged_bitmap (ctx=ctx@entry=0x7f0dba3ee630, sg=sg@entry=0x7f0ec1dc3bc8, frag_start=frag_start@entry=0, frag_chain=frag_chain@entry=0, i=i@entry=0, j=0, bitmap_nr=1, update=true, dup=0x7f0dba3ee62f)
    at src/object/cli_obj.c:4627
#3  0x000055bb2c70dc97 in obj_sgls_dup (obj_auxi=0x7f0ec1ed4f38, args=args@entry=0x7f0ec1ed4ea8, update=update@entry=true) at src/object/cli_obj.c:4763
#4  0x000055bb2c726bb5 in dc_obj_update (task=task@entry=0x7f0ec1ed4e10, epoch=epoch@entry=0x7f0dba3ee720, map_ver=1, args=args@entry=0x7f0ec1ed4ea8, obj=0x7f0ec1f1a210) at src/object/cli_obj.c:6172
#5  0x000055bb2c727daa in dc_obj_update_task (task=0x7f0ec1ed4e10) at src/object/cli_obj.c:6281
#6  0x00007f0f03ee596d in tse_task_schedule_with_delay (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true, delay=delay@entry=0) at src/common/tse.c:1047
#7  0x00007f0f03ee5bc2 in tse_task_schedule (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true) at src/common/tse.c:1056
#8  0x000055bb2c6088aa in dsc_task_run (task=0x7f0ec1ed4e10, retry_cb=retry_cb@entry=0x0, arg=arg@entry=0x7f0dba3ee858, arg_size=arg_size@entry=8, sync=sync@entry=true) at src/engine/srv_cli.c:112
#9  0x00007f0eff39f96e in dsc_obj_update (oh=..., flags=flags@entry=0, dkey=dkey@entry=0x7f0ec1dc3b68, nr=1, iods=iods@entry=0x7f0ec1dc43c8, sgls=sgls@entry=0x7f0ec1dc3bc8) at src/object/srv_cli.c:131
#10 0x00007f0eff459138 in cont_send_oit_bucket (oa=oa@entry=0x7f0ec1dc3b50, bucket_id=<optimized out>, bucket_id@entry=366) at src/container/srv_oi_table.c:82
#11 0x00007f0eff45a221 in cont_child_gather_oids (coc=0x7f0ec1d71f00, coh_uuid=coh_uuid@entry=0x7f0dba3e8ba0 "\331>\340\375\016\225M\026\275\060\221zf\211", <incomplete sequence \340>, epoch=2223939632600186880, 
    oit_oid=...) at src/container/srv_oi_table.c:209
#12 0x00007f0eff43e673 in cont_snap_notify_one (vin=0x7f0dba3e8b80) at src/container/srv_target.c:2186
#13 0x000055bb2c6305ad in collective_func (varg=0x7f0ef20e2dd0) at src/engine/ult.c:56
#14 0x00007f0f03a0944c in ABTD_ythread_func_wrapper (p_arg=0x7f0dba3eede0) at arch/abtd_ythread.c:12
#15 0x00007f0f039f7d29 in ABTD_ythread_context_func_wrapper (p_fctx=<optimized out>) at ../src/include/abtd_fcontext.h:74
#16 0x0000000000000000 in ?? ()

mchaarawi avatar May 28 '25 18:05 mchaarawi

this PR is failing NLT because it causes a server segfault there:

Core was generated by `/home/chaarawi/install/daos/bin/daos_engine -t 4 -x 2 -f 0 -g daos_server -d /t'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247
247			setbit(bitmap, index);
[Current thread is 1 (Thread 0x7f0ecd26c640 (LWP 2282148))]
Missing separate debuginfos, use: dnf debuginfo-install daxctl-libs-71.1-8.el9.x86_64 glibc-2.34-60.el9_2.7.x86_64 hwloc-libs-2.4.1-5.el9.x86_64 kmod-libs-28-7.el9.x86_64 libaio-0.3.111-13.el9.x86_64 libgcc-11.3.1-4.3.el9.x86_64 libibverbs-44.0-2.el9.x86_64 libnl3-3.7.0-1.el9.x86_64 libunwind-1.6.2-1.el9.x86_64 libuuid-2.37.4-11.el9_2.x86_64 libyaml-0.2.5-7.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 ndctl-libs-71.1-8.el9.x86_64 numactl-libs-2.0.14-9.el9.x86_64 systemd-libs-252-14.el9_2.3.0.1.x86_64 xz-libs-5.2.5-8.el9_0.x86_64
(gdb) bt
#0  0x000055bb2c705b22 in setbit_range (end=4294967295, start=0, bitmap=0x7f0ec2316ed0 '\377' <repeats 200 times>...) at src/include/daos/common.h:247
#1  setbits64 (bits=0, at=0, bmap=0x7f0ec2316ed0) at src/include/daos/common.h:253
#2  sgls_set_merged_bitmap (ctx=ctx@entry=0x7f0dba3ee630, sg=sg@entry=0x7f0ec1dc3bc8, frag_start=frag_start@entry=0, frag_chain=frag_chain@entry=0, i=i@entry=0, j=0, bitmap_nr=1, update=true, dup=0x7f0dba3ee62f)
    at src/object/cli_obj.c:4627
#3  0x000055bb2c70dc97 in obj_sgls_dup (obj_auxi=0x7f0ec1ed4f38, args=args@entry=0x7f0ec1ed4ea8, update=update@entry=true) at src/object/cli_obj.c:4763
#4  0x000055bb2c726bb5 in dc_obj_update (task=task@entry=0x7f0ec1ed4e10, epoch=epoch@entry=0x7f0dba3ee720, map_ver=1, args=args@entry=0x7f0ec1ed4ea8, obj=0x7f0ec1f1a210) at src/object/cli_obj.c:6172
#5  0x000055bb2c727daa in dc_obj_update_task (task=0x7f0ec1ed4e10) at src/object/cli_obj.c:6281
#6  0x00007f0f03ee596d in tse_task_schedule_with_delay (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true, delay=delay@entry=0) at src/common/tse.c:1047
#7  0x00007f0f03ee5bc2 in tse_task_schedule (task=task@entry=0x7f0ec1ed4e10, instant=instant@entry=true) at src/common/tse.c:1056
#8  0x000055bb2c6088aa in dsc_task_run (task=0x7f0ec1ed4e10, retry_cb=retry_cb@entry=0x0, arg=arg@entry=0x7f0dba3ee858, arg_size=arg_size@entry=8, sync=sync@entry=true) at src/engine/srv_cli.c:112
#9  0x00007f0eff39f96e in dsc_obj_update (oh=..., flags=flags@entry=0, dkey=dkey@entry=0x7f0ec1dc3b68, nr=1, iods=iods@entry=0x7f0ec1dc43c8, sgls=sgls@entry=0x7f0ec1dc3bc8) at src/object/srv_cli.c:131
#10 0x00007f0eff459138 in cont_send_oit_bucket (oa=oa@entry=0x7f0ec1dc3b50, bucket_id=<optimized out>, bucket_id@entry=366) at src/container/srv_oi_table.c:82
#11 0x00007f0eff45a221 in cont_child_gather_oids (coc=0x7f0ec1d71f00, coh_uuid=coh_uuid@entry=0x7f0dba3e8ba0 "\331>\340\375\016\225M\026\275\060\221zf\211", <incomplete sequence \340>, epoch=2223939632600186880, 
    oit_oid=...) at src/container/srv_oi_table.c:209
#12 0x00007f0eff43e673 in cont_snap_notify_one (vin=0x7f0dba3e8b80) at src/container/srv_target.c:2186
#13 0x000055bb2c6305ad in collective_func (varg=0x7f0ef20e2dd0) at src/engine/ult.c:56
#14 0x00007f0f03a0944c in ABTD_ythread_func_wrapper (p_arg=0x7f0dba3eede0) at arch/abtd_ythread.c:12
#15 0x00007f0f039f7d29 in ABTD_ythread_context_func_wrapper (p_fctx=<optimized out>) at ../src/include/abtd_fcontext.h:74
#16 0x0000000000000000 in ?? ()

Thanks, i missed to init environment in the server case.

wangshilong avatar May 29 '25 02:05 wangshilong

Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/8/display/redirect

daosbuild3 avatar May 29 '25 05:05 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/9/display/redirect

daosbuild3 avatar May 29 '25 20:05 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16186/9/testReport/

daosbuild3 avatar May 29 '25 22:05 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/10/display/redirect

daosbuild3 avatar May 30 '25 06:05 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/10/execution/node/1299/log

daosbuild3 avatar May 30 '25 11:05 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/13/execution/node/1423/log

daosbuild3 avatar Jul 02 '25 05:07 daosbuild3

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/14/execution/node/1509/log

daosbuild3 avatar Jul 06 '25 21:07 daosbuild3

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16186/15/execution/node/441/log

daosbuild3 avatar Jul 07 '25 01:07 daosbuild3

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16186/16/display/redirect

daosbuild3 avatar Jul 07 '25 21:07 daosbuild3

can't see the results, but it passed 2.6 tests (except linting error), and also passed here (https://github.com/daos-stack/daos/pull/16566), so I'm landing it.

gnailzenh avatar Jul 08 '25 13:07 gnailzenh