Arun Chandran
Arun Chandran
> just for my understanding, do we have a repro (ucx_perftest, some osu tests..) with perf gain on applicable cpu? Yes, I already shared the osu_latency numbers for a zen3...
[zen3_p2p_share.xlsx](https://github.com/openucx/ucx/files/15147212/zen3_p2p_share.xlsx) > > just for my understanding, do we have a repro (ucx_perftest, some osu tests..) with perf gain on applicable cpu? > > saw description now, any perftest reproducer...
> > 1. By default it doesn't run in CI because need to pass "--enable-optimizations" to build with AVX support. Can we add CI step for this - build with...
> > I reduced the test_window_size to 4k and alignment to 64. This is enough to test all the scenarios and branches in the code. > > BTW, how long...
> Can we reduce the time 2x more, or skip the test altogether on ASAN? I reduced test_window_size to 2K. This is the least minimum size required, it covers the...
@yosefe @edgargabriel I will take a look at this issue. Thanks.
@yosefe, @mkre, @huaxiangfan I have some questions related to the original PR [#8062](https://github.com/openucx/ucx/pull/8062) which introduced relaxed ordering on a subset of AMD cpus(Naples, Rome and Milan) and the observed regression...
@mkre is this issue seen only on AMD EPYC 7532 nodes? [Trying to understand whether it affects only Rome CPUs]
@mkre, could you please try running the performance benchmark [http://github.com/linux-rdma/perftest](https://github.com/linux-rdma/perftest) on these nodes? Did you observe a performance drop in the benchmark results with RO=1 for both UD and DC?...
In the below tests I am not particularly sure which test is relevant with "RO=1" for the DMA from NIC to memory. @yosefe @huaxiangfan could you please suggest? , meanwhile...