Edgar Gabriel

Results 137 comments of Edgar Gabriel

@koomie in fact, if you have time, could you test your code with the ucx-1.14.0-rc2 release to see whether the issue is resolved there? That would be valuable input/feedback! (See...

just for documentation purposes, we are communicating and working on it off the list.

Its a very fundamental problem that was also discussed in the UCC developers meetings. I can't recall a ROCm version where I did not see this warning, but the most...

I am confused: you are talking about ROCm 5.7.1 but than you set --with-cuda=... (instead of --with-rocm=...) and -with-nccl ( insetead of --with-rccl=...). cuda/nccl are for NVidia GPUs, rocm/rccl are...

@devreal thank you for all of this work! Can I make a suggestion? This is a *massive* pr as it is at the moment. Could we try to break it...

I performed some tests and measurements with this component, I think the performance improvements are very significant and we should try to merge this in the near future, before we...

> > configure:27839: checking for rocm_smi/rocm_smi.h > > configure:27839: gcc -c -g -O2 -I/opt/rocm/rocm_smi/include/ conftest.c >&5 > > configure:27839: $? = 0 > > configure:27839: result: yes > > configure:27847:...

@bgoglin you are correct, it works with ROCm 6.0.2. ROCm 6.0.0 and 6.0.1 had unfortunately a bug in the rocm_smi.h header file that prevented compilation of hwloc (or any C...

no, its the other way around. With mpirun there is no warning, only with singleton

I think so, I think I saw it a few days ago. I can double check in a few days and look into it, its coming from the btl.smcuda component...