Edgar Gabriel
Edgar Gabriel
[ucx_info.txt](https://github.com/openucx/ucx/files/15390891/ucx_info.txt)
Any command line argument will generate the issue on this platform, include just doing ```./gtest``` gtest gets stuck ```uct_test_base::enum_resources()``` (resp. ```uct_md_query_tl_resources``` invoked from there) for the ```gga_mlx5``` tl. For whatever...
as an additional data point, running gtest through valgrind memchecker does not show any memory corruption
Sure, here it is: [config.log](https://github.com/openucx/ucx/files/15404973/config.log)
I think I have multiple hits at that breakpoint, that might be the difference ``` $ gdb --args ./gtest Reading symbols from ./gtest... (gdb) break uct_tl_register Function "uct_tl_register" not defined....
In fact, ```uct_tl_register``` is also invoked twice for uct_ib_component for each tl (uct_dc_mlx5_tl, uct_rc_verbs_tl, uct_rc_mlx5_tl, uct_ud_verbs_tl, uct_ud_mlx5_tl), so its not just uct_gga_component that seems to be registered twice. Note that...
I will check on the other platform, but just to clarify, you would expect uct_ib_component to be registered twice with uct_dc_mlx5_tl, twice with uct_rc_verbs_tl, etc?
it would be good if we could have this PR merged in the near future, since this would simplify evaluating/testing the subsequent PRs
@wenduwan Assuming that you do not have AMD GPUs, I think the prerequisite for testing this PR would require the accelerator/cuda component to implement the missing IPC functionality.
btw. I have no clue why the continuous-integration/jenkins/pr-head is marked as 'failed'. If you click on Details, all tests seemed to have passed correctly