ucx icon indicating copy to clipboard operation
ucx copied to clipboard

UCT/IB: Fix leak if some operations failed after uct_ib_md_open_common()

Open dmitrygx opened this issue 3 years ago • 7 comments

What

Fix leak if some operations failed after uct_ib_md_open_common().

Why ?

If some operations failed after uct_ib_md_open_common(), then we don't roll back changes done by uct_ib_md_open_common().

How ?

  1. Introduce uct_ib_md_close_common() which performance reverse actions which could be done by uct_ib_md_open_common().
  2. Update uct_ib_mlx5_devx_md_open() and uct_ib_mlx5dv_md_open() to invoke uct_ib_md_close_common() if some operations failed after uct_ib_md_open_common() was invoked.

dmitrygx avatar Aug 19 '22 10:08 dmitrygx

@Artemy-Mellanox can you pls review?

yosefe avatar Aug 21 '22 13:08 yosefe

@Artemy-Mellanox @brminich could you review pls?

dmitrygx avatar Aug 24 '22 12:08 dmitrygx

@Artemy-Mellanox @brminich @yosefe could you review pls?

dmitrygx avatar Aug 30 '22 14:08 dmitrygx

@yosefe could you review pls?

dmitrygx avatar Sep 01 '22 15:09 dmitrygx

@yosefe could you review pls?

dmitrygx avatar Sep 06 '22 05:09 dmitrygx

@Artemy-Mellanox @yosefe could you review pls?

dmitrygx avatar Sep 12 '22 13:09 dmitrygx

@Artemy-Mellanox @yosefe could you review pls?

dmitrygx avatar Sep 19 '22 05:09 dmitrygx

@Artemy-Mellanox could you review pls?

dmitrygx avatar Sep 22 '22 14:09 dmitrygx

@Artemy-Mellanox could you review pls?

dmitrygx avatar Sep 25 '22 08:09 dmitrygx

@yosefe could you review pls?

dmitrygx avatar Sep 26 '22 07:09 dmitrygx

@yosefe could you review pls?

dmitrygx avatar Oct 02 '22 07:10 dmitrygx

@dmitrygx can you pls fix the new conflict?

yosefe avatar Oct 14 '22 07:10 yosefe

@dmitrygx can you pls fix the new conflict?

@yosefe thanks for noticing, done

dmitrygx avatar Oct 14 '22 10:10 dmitrygx

On AWS machine with EFA TCP Nic:

[  PASSED  ] 11065 tests.
[  FAILED  ] 32 tests, listed below:
[  FAILED  ] test_uct_ib_sl_utils.query_ooo_sl_mask, where TypeParam =  and GetParam() = 
[  FAILED  ] ib/test_md.rkey_ptr/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.alloc/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.mem_type_detect_mds/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.mem_query/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.sys_device/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.reg/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.reg_perf/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.reg_advise/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.alloc_advise/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.reg_multi_thread/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.sockaddr_accessibility/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.invalidate/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.reg_bad_arg/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.dereg_bad_arg/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md.exported_mkey/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_md_fork.fork/0, where GetParam() = rdmap0s6
[  FAILED  ] alloc_methods/test_mem.md_alloc/0, where GetParam() = 0
[  FAILED  ] alloc_methods/test_mem.md_alloc/1, where GetParam() = 2
[  FAILED  ] alloc_methods/test_mem.md_alloc/2, where GetParam() = 3
[  FAILED  ] alloc_methods/test_mem.md_alloc/3, where GetParam() = 4
[  FAILED  ] alloc_methods/test_mem.md_fixed/0, where GetParam() = 0
[  FAILED  ] alloc_methods/test_mem.md_fixed/1, where GetParam() = 2
[  FAILED  ] alloc_methods/test_mem.md_fixed/2, where GetParam() = 3
[  FAILED  ] alloc_methods/test_mem.md_fixed/3, where GetParam() = 4
[  FAILED  ] cma/test_uct_perf.envelope/0, where GetParam() = cma/memory
[  FAILED  ] cma/test_uct_loopback.envelope/0, where GetParam() = cma/memory
[  FAILED  ] ib/test_ib_md.ib_md_umr_rcache/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_ib_md.ib_md_umr_direct/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_ib_md.ib_md_umr_ksm/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_ib_md.relaxed_order/0, where GetParam() = rdmap0s6
[  FAILED  ] ib/test_ib_md.aligned/0, where GetParam() = rdmap0s6

shamisp avatar Oct 15 '22 16:10 shamisp

@yosefe squashed, could you review pls?

dmitrygx avatar Oct 16 '22 08:10 dmitrygx