Arun Chandran comments

Results 22 comments of


                                            Arun Chandran

ARCH/X86: Introduce non temporal buffer transfer

> just for my understanding, do we have a repro (ucx_perftest, some osu tests..) with perf gain on applicable cpu? Yes, I already shared the osu_latency numbers for a zen3...

ARCH/X86: Introduce non temporal buffer transfer

[zen3_p2p_share.xlsx](https://github.com/openucx/ucx/files/15147212/zen3_p2p_share.xlsx) > > just for my understanding, do we have a repro (ucx_perftest, some osu tests..) with perf gain on applicable cpu? > > saw description now, any perftest reproducer...

ARCH/X86: Introduce non temporal buffer transfer

> > 1. By default it doesn't run in CI because need to pass "--enable-optimizations" to build with AVX support. Can we add CI step for this - build with...

ARCH/X86: Introduce non temporal buffer transfer

> > I reduced the test_window_size to 4k and alignment to 64. This is enough to test all the scenarios and branches in the code. > > BTW, how long...

ARCH/X86: Introduce non temporal buffer transfer

> Can we reduce the time 2x more, or skip the test altogether on ASAN? I reduced test_window_size to 2K. This is the least minimum size required, it covers the...

Performance regression UCX 1.10.0 -> 1.14.0 on AMD system

@yosefe @edgargabriel I will take a look at this issue. Thanks.

Performance regression UCX 1.10.0 -> 1.14.0 on AMD system

@yosefe, @mkre, @huaxiangfan I have some questions related to the original PR [#8062](https://github.com/openucx/ucx/pull/8062) which introduced relaxed ordering on a subset of AMD cpus(Naples, Rome and Milan) and the observed regression...

Performance regression UCX 1.10.0 -> 1.14.0 on AMD system

@mkre is this issue seen only on AMD EPYC 7532 nodes? [Trying to understand whether it affects only Rome CPUs]

Performance regression UCX 1.10.0 -> 1.14.0 on AMD system

@mkre, could you please try running the performance benchmark [http://github.com/linux-rdma/perftest](https://github.com/linux-rdma/perftest) on these nodes? Did you observe a performance drop in the benchmark results with RO=1 for both UD and DC?...

Performance regression UCX 1.10.0 -> 1.14.0 on AMD system

In the below tests I am not particularly sure which test is relevant with "RO=1" for the DMA from NIC to memory. @yosefe @huaxiangfan could you please suggest? , meanwhile...