Moritz Kreutzer

Results 21 comments of Moritz Kreutzer

Hi @yosefe, I finally got back to this. I did another bisection using a larger case ("jetnoise") on four nodes of our AMD system. The regression is pretty severe for...

Yes, it does. I will run our entire test suite with that setting.

@yosefe, seems like fb6401c8c760b2601368f9209dfc3527b87cc71c might be another offender (but not sure yet). Is there an environment variable to roll back to the old behavior (revert doesn't work) so that I...

@yosefe, we decided to go ahead with patching out 7bafc7c201ec14d40e9526fa1f77325fe8d473d2 from 1.14.0 and setting `UCX_IB_PCI_RELAXED_ORDERING=n` to avoid any regressions with the new UCX version. The PCI relaxed ordering regression seems...

@yosefe With UCX 1.16.0, we still need to set `UCX_IB_PCI_RELAXED_ORDERING=n` on our AMD cluster to avoid performance regressions. Are you planning to address this issue?

> Additionally, @mkre mentioned that the same commit causing the performance drop is observed only with UD (DC is fine) on AMD EPYC 7532 nodes based on the Rome architecture....

@arun-chandran-edarath, > is this issue seen only on AMD EPYC 7532 nodes? [Trying to understand whether it affects only Rome CPUs] I don't know, as I don't have any other...

So far I could not see any reliable improvement with `--disable_pcie_relaxed` in my tests (tried a few different parameters, bw / latency measurements, different transports). The parameter space is quite...

OK, so I guess we need to run our performance test cases with 64 and 256 and try to find the best tradeoff between performance and memory consumption? So far,...

Thanks for getting back @tvegas1. > In principle, augmentation per rank should be constant regardless of number of ranks. That's good information to have. I will check again whether it...