ucx
ucx copied to clipboard
Memory regression in UCX 1.16.0 due to changed FIFO size
We are seeing memory regressions with UCX 1.16.0 compared to UCX 1.14.0 on our AMD InfiniBand cluster. I did a git bisection and the offending commit is 5c1145d575d7ac58ec27e752d428b4efa697552b. The regression goes away after setting UCX_MM_FIFO_SIZE=64
(which is the old value before this commit).
Is this just an expected tradeoff between performance and memory?
We are currently considering setting UCX_MM_FIFO_SIZE=64
globally when upgrading to UCX 1.16.0 to avoid regressions. Under which circumstances would the new setting of 256
lead to better performance (i.e. we should consider abstaining from setting it the old value)?
@mkre We saw that FIFO_SIZE=256 improves performance with Alltoall benchmarks of certain sizes cc @tvegas1
OK, so I guess we need to run our performance test cases with 64 and 256 and try to find the best tradeoff between performance and memory consumption?
So far, we are only seeing up to 10% memory regressions (even for relatively small node counts of 2-4), which is pretty significant, so we likely have to fall back to the old setting.
This will help in presence of intra-node communication, there is a graph in original description of PR #9194 showing performance gain for each rank count. For instance with >= 128 ranks, for small sizes like ~2kB on osu alltoall, the latency is more than halved.
What is the amount of added memory per rank in bytes? In principle, augmentation per rank should be constant regardless of number of ranks. Going from 64->256 could augment ~1.5MB/rank memory consumption ((256-64)*8kB).
You could also try to set SEG_SIZE to 4kB and see if it helps without degradation.
Thanks for getting back @tvegas1.
In principle, augmentation per rank should be constant regardless of number of ranks.
That's good information to have. I will check again whether it is a constant per-rank overhead. That being said, to be on the safe side we might still want to go back to FIFO_SIZE=64
if our performance tests do not indicate any improvement from the new default (e.g., because alltoall performance is not a limiting factor for us).
You could also try to set SEG_SIZE to 4kB and see if it helps without degradation.
I will run some tests with UCX_MM_SEG_SIZE=4096
and report back the results.
@tvegas1 Setting UCX_MM_SEG_SIZE=4096
removes some of the regression. I guess I will run a complete test sweep with FIFO_SIZE=64
and then decide whether we can accept the new default behavior of FIFO_SIZE=256
(e.g., if the memory increase is just constant but we do get some performance benefit), or manually go back to the old FIFO size.
Thanks for your help here. From my point of view, we can close this ticket, as you gave a good explanation about why this change was made and what to expect from it.