Howard Pritchard

Results 238 comments of Howard Pritchard

okay let's try one more thing. Could you try our [nightly main tarball](https://download.open-mpi.org/nightly/open-mpi/main/openmpi-main-202406080241-2aef545.tar.gz)? I beginning to think that you are hitting a different problem on your system that I'm not...

pinged some folks about this.

Thanks for trying this. It looks like some of these may well be attributed to Open MPI. Most of the functions caught will only be run once in the call...

could you rerun the test with ``` export OMPI_MCA_pml=ob1 export OMPI_MCA_btl=^uct ``` and see if the test passes?

@rhc54 thanks for the Docker container suggestion. isn't a substitute for understanding and hopefully addressing this TCP issue. @JBlaschke do you know if this is reproducible outside of the container?...

I guess this will wait as PM is undergoing maintenance today.

Has this PR been rebased on main? The mpi4py failure looks like it needs rebaing.

here's the traceback: ``` #0 0x00007ffff71ec82d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffff71e5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007ffff17c1c18 in opal_thread_internal_mutex_lock (p_mutex=0x7ffff1acd6e8 ) at ../../../opal/mca/threads/pthreads/threads_pthreads_mutex.h:82 #3 0x00007ffff17c1ccc in opal_mutex_lock...

Another option would be to use ```opal_recursive_mutex_t``` type mutex. probably removing the locking (which does seem buggy) would cause MPI_T to not support threadmultiple. I suspect this is the first...