ompi
ompi copied to clipboard
[CI test only] v5.0.x - Scale test PR
Testing scale launch on v5.0.x via https://github.com/open-mpi/ompi/wiki/PRJenkins#ibm-ci-scale-testing-adjustment-triggers mechanism.
bot:notacherrypick
bot:ibm:scale:test
bot:notacherrypick
bot:ibm:scale:128:test
bot:ibm:scale:128:test
bot:ibm:scale:128:test
Thanks for fixing back-end-regex parsing for scale testing up to 128 virtual nodes.
bot:ibm:scale:32:test
bot:ibm:scale:128:test
The IBM CI (GNU/Scale) build failed! Please review the log, linked below.
Gist: https://gist.github.com/03d1d559d97d7feed001516b2cd44849
bot:aws:retest
Most scale testing @ 128 pseudo-nodes worked, but ring_c failed with timeout after 300s. It's unclear why...
I'll retry at 64, and keep an eye on it.
Run Scale Examples : timeout --preserve-status -k 310s 310s /workspace/exports/ompi/bin/mpirun --hostfile /workspace/hostfile.txt --npernode 2 --mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0 --mca pml ob1 --mca osc ^ucx --mca btl tcp,vader,self ring_c
: Failed [20] on ring_c with return code 256 (0:05:00)
Destroy Virtual Cluster : ...
: Passed (0:02:36)
-- : Some tests did not pass... (0:26:15)
---------------------------------------------------------------------------
######################################################################
########## Run Scale Examples
######################################################################
########################################
ssh c656f6n02 timeout --preserve-status -k 300s 300s docker exec -i --env WORKSPACE=/workspace -u 59674:59674 -w /workspace/ompi-src/examples ee90ba9915fe 'timeout --preserve-status -k 310s 310s /workspace/exports/ompi/bin/mpirun --hostfile /workspace/hostfile.txt --npernode 2 --mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0 --mca pml ob1 --mca osc ^ucx --mca btl tcp,vader,self ring_c'
########################################
Process 0 sending 10 to 1, tag 201 (256 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
bot:ibm:scale:64:test
64 worked well. Lets try 128 again.
bot:ibm:scale:128:test
The IBM CI (GNU/Scale) build failed! Please review the log, linked below.
Gist: https://gist.github.com/f7a998cd453cfb31eaf89852d654cd41
bot:ibm:scale:64:test
bot:ibm:scale:64:test
I just rebased to latest v5.0.x along with latest submodule pointers. Once this passes CI I'll rerun scale testing.
bot:ibm:retest bot:ibm:nodes:32:test bot:ibm:ppn:2:test
bot:ibm:scale:64:test
bot:ibm:scale:64:test