oneCCL
oneCCL copied to clipboard
Allreduce cpu example fails with CCL_WORKER_COUNT > 1
I started playing with allreduce example from the main repository https://github.com/oneapi-src/oneCCL/blob/master/examples/cpu/cpu_allreduce_test.cpp .
I modified it slightly by increasing the buffer size 100 times:
diff --git a/examples/cpu/cpu_allreduce_test.cpp b/examples/cpu/cpu_allreduce_test.cpp
index 6e9ac4d..5dfe2d9 100644
--- a/examples/cpu/cpu_allreduce_test.cpp
+++ b/examples/cpu/cpu_allreduce_test.cpp
@@ -22,7 +22,7 @@
using namespace std;
int main() {
- const size_t count = 4096;
+ const size_t count = 4096*100;
size_t i = 0;
When I run it with the CCL_WORKER_COUNT environment variable with a value > 1 it fails with the following errors:
piotrc@machine:~/ws/oneCCL/build$ CCL_WORKER_COUNT=2 mpirun -np 2 examples/cpu/cpu_allreduce_test
[1705415958.879795729] machine:rank1.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support
[1705415958.879801821] machine:rank1.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support
machine:rank1: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen
machine:rank1: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 559315 RUNNING AT gbnwp-pod023-1
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 559316 RUNNING AT gbnwp-pod023-1
= KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
With CCL_WORKER_COUNT=1 it works perfect.
piotrc@machine:~/ws/oneCCL/build$ mpirun -np 2 examples/cpu/cpu_allreduce_test
PASSED
What am I doing wrong ? Why it fails ? Should I use specific flags when compiling or set some specific environment variable or pass a specific option to mpirun ? It is worth mention that with smaller buffer size (for example 4096 * 10) everything works fine even with CCL_WORKER_COUNT set with value > 1.
Attached CCL_LOG_LEVEL=info logs.txt Attached CCL_LOG_LEVEL=debug logs_debug.txt
Possible workaround:
FI_PROVIDER=verbs CCL_WORKER_COUNT=2 ../../install/bin/mpirun -np 2 ../../install/examples/cpu/cpu_allreduce_test PASSED
FI_PROVIDER=tcp CCL_WORKER_COUNT=2 ../../install/bin/mpirun -np 2 ../../install/examples/cpu/cpu_allreduce_test PASSED
@piotrchmiel Hi. Your fi_info should say that psm3 is available for you, do you see that? Please execute it and check. https://github.com/oneapi-src/oneCCL/tree/master/deps/ofi/bin Can you please give a hint how do you compile oneccl?
@piotrchmiel , you can try this. echo 0 > /proc/sys/kernel/yama/ptrace_scope.