Moritz Kreutzer
Moritz Kreutzer
@bartoldeman, thanks a lot for your input. Your usage scenario is pretty similar to ours. We are already applying quite a few of these manual MCA settings to prevent different...
@rapiz1, as noted above, I don't want to pollute my general SSH environment with these non-standard paths only the clangd extension is relying on. I am currently using a wrapper...
Specifying `--enable-mca-dso=common-ofi,mtl-ofi,btl-ofi` works, thanks @bwbarrett! I wasn't aware of the (new?) BTL/ofi component while I was composing the explicit DSO list based on the MCA parameters we had in Open...
Hi @bastienwirtz, when you get a chance it would be great if you could have a look here :-).
The regression is still present with 1.15.0-rc1. I have verified using the binary distribution ucx-1.15.0-rc1-centos8-mofed5-cuda11-x86_64.tar.bz2 to rule out any build/configuration issues on my end.
@yosefe, I can try. Please give me some time.
@yosefe, I did a `git bisect` and interestingly, it points to commit 7bafc7c201ec14d40e9526fa1f77325fe8d473d2. Does that make any sense to you? I have to say that the bisection process wasn't crystal...
The reported timings are average timings over multiple iterations, not including initialization or startup time. That is, we call `MPI_Init()`, let the code run for some warmup iterations, then for...
Hi @yosefe, I did another bisection using a different test case and it again pointed me to 7bafc7c201ec14d40e9526fa1f77325fe8d473d2. Maybe the changed frequency measurement leads to some changed tuning parameters causing...
Reverting the commit helped on my small-scale bisection tests, but not on larger scales. I will do another bisection on a larger node count. There must be another offending commit....