ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Segfault at thread exit with latest main branch

Open SeyedMir opened this issue 3 years ago • 4 comments

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Main branch commit 4265e248c

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed from git clone.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

$ git submodule status
 6692c28a4daed5e99443eb724231d7300287fb2c ../3rd-party/openpmix (v1.1.3-3495-g6692c28a)
 7ae2c083189db0881d2eff29d71bd507be02bad3 ../3rd-party/prrte (psrvr-v2.0.0rc1-4340-g7ae2c08318)

Please describe the system on which you are running

  • Operating system/version: Ubuntu 18.04.5, kernel 4.15.0-142-generic
  • Computer hardware:
  • Network type:

Details of the problem

I have a use case where I initialize/finalize MPI from a dynamically loaded .so plugin file. Building the plugin with the latest ompi main branch leads to a segfault at pthread exit time. It works fine with 4.1.2 though.

Steps to reproduce:

  1. create the following files: test.c plugin.h mpi_plugin.c
$ cat test.c
#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>
#include <assert.h>
#include <pthread.h>
#include "plugin.h"

static void *plugin_hdl;

int my_init(const char *plugin, handle_t *handle) {
  int (*plugin_init)(handle_t *handle);
  int status = 0;

  plugin_hdl  = dlopen(plugin, RTLD_NOW);
  assert(plugin_hdl);

  plugin_init = dlsym(plugin_hdl, "plugin_init");

  status = plugin_init(handle);
  assert(!status);

  printf("initialized\n");
  return status;
}

int my_finalize(handle_t *handle) {
  int status = handle->finalize(handle);
  assert(!status);

  dlclose(plugin_hdl);

  printf("finalized\n");
  return 0;
}

void *t_func(void *arg)
{
    printf("thread started\n");
    sleep(1);
    printf("thread exiting\n");
    return NULL;
}

int main(int argc, char **argv)
{
    handle_t handle;

    int rc;
    pthread_t t1;

    rc = my_init("test_plugin.so", &handle),
    assert(!rc);

    rc = my_finalize(&handle);
    assert(!rc);

    rc = pthread_create(&t1, NULL, t_func, NULL);
    assert(!rc);

    pthread_join(t1, NULL);

    return 0;
}
$ cat plugin.h
#ifndef PLUGIN_H
#define PLUGIN_H

typedef struct handle {
  int pg_rank;
  int pg_size;
  int (*finalize)(struct handle *handle);
} handle_t;

int plugin_init(handle_t *handle);

#endif
$ cat mpi_plugin.c
#include <assert.h>
#include <mpi.h>
#include "plugin.h"

static int mpi_finalize(handle_t *handle) {
  int rc = MPI_Finalize();
  assert(rc == MPI_SUCCESS);
  return 0;
}

int plugin_init(handle_t *handle) {
  int rc = MPI_Init(NULL, NULL);
  assert(rc == MPI_SUCCESS);
  handle->finalize = mpi_finalize;
  return 0;
}
  1. mpicc -shared -fPIC -o test_plugin.so mpi_plugin.c
  2. gcc test.c -ldl -pthread
  3. mpirun -n 1 ./a.out

SeyedMir avatar Apr 04 '22 19:04 SeyedMir

@streichler FYI

SeyedMir avatar Apr 04 '22 19:04 SeyedMir

I am unable to reproduce. Are you using an external HWLOC installation perchance?

Can you share a backtrace of the segv (compiling with -g).

awlauria avatar Apr 08 '22 13:04 awlauria

I build with --with-hwloc=internal and I tested on another system and got the same segfault. I'm on commit 49460b41 This is the backtrace I get:

#0  0x00007ffff7f87580 in ?? ()
#1  0x00007ffff79b78b9 in advise_stack_range (guardsize=<optimized out>, pd=140737061897984, size=<optimized out>, mem=0x7fffe614c000) at allocatestack.c:386
#2  start_thread (arg=0x7fffe694c700) at pthread_create.c:552
#3  0x00007ffff76e071f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

SeyedMir avatar Apr 08 '22 18:04 SeyedMir

Thanks. Can you share your config.log? I am unable to reproduce, neither on a RHEL box nor ubuntu v18.04.

awlauria avatar Apr 18 '22 19:04 awlauria

It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it.

github-actions[bot] avatar May 02 '24 17:05 github-actions[bot]

Per the above comment, it has been a month with no reply on this issue. It looks like this issue has been abandoned.

I'm going to close this issue. If I'm wrong and this issue is not abandoned, please feel free to re-open it. Thank you!

github-actions[bot] avatar May 16 '24 21:05 github-actions[bot]