ompi icon indicating copy to clipboard operation
ompi copied to clipboard

MPI_T Events

Open cchambreau opened this issue 5 years ago • 49 comments
trafficstars

This PR includes Nathan Hjelm's reworking of PERUSE functionality to implement the MPI_T Events API as specified in the MPI 4.0 proposal. Subsequent minor changes to the MPI_T Events API were made and included here. The API and an initial set of events were tested against Marc-André Hermann's MEL tracing tool and other test cases. History, a description of the API and experience with tools can be found in "Enabling callback-driven runtime introspection via MPI_T."

cchambreau avatar Sep 22 '20 21:09 cchambreau

Can one of the admins verify this patch?

ompiteam-bot avatar Sep 22 '20 21:09 ompiteam-bot

ok to test

jsquyres avatar Sep 22 '20 21:09 jsquyres

@cchambreau, Hi. Is this desirable for Open MPI v5.0 (not MPI-4.0 compliant)? If so please, rebase it and get it into master and then cherry-pick to v5.0.x branch after it's in master.

gpaulsen avatar Mar 26 '21 15:03 gpaulsen

@cchambreau could you please rebase? thanks.

hppritcha avatar Sep 21 '21 15:09 hppritcha

Howard will rebase and merge unless objections are raised.

hppritcha avatar Nov 16 '21 16:11 hppritcha

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/d182f369f64bb66a239cbf4b56b1e3ce

ibm-ompi avatar Dec 23 '21 23:12 ibm-ompi

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/20f1009df186bd52f322544e39a3595c

ibm-ompi avatar Dec 23 '21 23:12 ibm-ompi

@cchambreau can you fix the conflicts on this PR? Thanks!

jsquyres avatar Jan 01 '22 15:01 jsquyres

bot:ompi:retest

hppritcha avatar Jan 10 '22 17:01 hppritcha

Hey @hppritcha Are there any plans to continue this work (and delete PERUSE, per #189)?

jsquyres avatar Apr 08 '22 04:04 jsquyres

@jsquyres this is indeed kind of stalled out. We need assistance here from @cchambreau and @hjelmn

hppritcha avatar Apr 08 '22 14:04 hppritcha

@jsquyres, @hppritcha : I will take a pass at the requested changes and contact @hjelmn or others with questions.

cchambreau avatar Apr 13 '22 17:04 cchambreau

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6

ibm-ompi avatar Jun 01 '22 22:06 ibm-ompi

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6

ibm-ompi avatar Jun 01 '22 22:06 ibm-ompi

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/1956ee0eecf44a1368bfc1c598aad17c

ibm-ompi avatar Jun 01 '22 22:06 ibm-ompi

bot:ibm:retest

cchambreau avatar Jun 26 '22 20:06 cchambreau

bot:ibm:gnu:retest

cchambreau avatar Jun 27 '22 00:06 cchambreau

@cchambreau This PR has conflicts that must be resolved; it's unlikely that (re)running tests will be useful without fixing those conflict errors first.

jsquyres avatar Jun 27 '22 15:06 jsquyres

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/8f221d51807e996900724ae757e9168c

ibm-ompi avatar Jun 27 '22 21:06 ibm-ompi

@jsquyres I addressed the include file conflicts. The copyright conflict looks like a typo.
The PGI check is failing with:

  PPFC     profile/pcomm_agree_f08.lo
pgfortran-Error-Unknown switch: -iquote../../../..
make[2]: *** [Makefile:1464: comm_revoke_f08.lo] Error 1

I am unable to recreate this with my local PGI build.

The Pull Request Build Checker fails with:

make[3]: Entering directory '/home/ubuntu/workspace/open-mpi.build.distcheck/src/openmpi-gitclone/_build/sub/opal/mca/common/sm'
  CC       common_sm.lo
  CC       common_sm_mpool.lo
FATAL: command execution failed

Are you comfortable resolving the conflicts? Do you have any advice for the check failures?

cchambreau avatar Jun 27 '22 22:06 cchambreau

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/09dbaa6cd75a1ac69398d14942a9018b

ibm-ompi avatar Jun 28 '22 14:06 ibm-ompi

Can you rebase your branch? It looks like there are conflicts detected by GH. If the IBM PGI CI still fails after that I can look in the backend to see if I can get more information for you.

jjhursey avatar Jun 28 '22 15:06 jjhursey

Can you rebase your branch? It looks like there are conflicts detected by GH. If the IBM PGI CI still fails after that I can look in the backend to see if I can get more information for you.

@cchambreau I think @jjhursey's advice is the best: rebase your PR branch on top of HEAD of main, and you'll end up fixing the conflicts. All the CI tests build/test an implicit merge from your PR branch to main. Hence, the tests are failing where the git conflicts between your branch and main are ending up in compile errors, etc.

jsquyres avatar Jun 28 '22 16:06 jsquyres

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/c60af88444836261a9a58e7c74a11d65

ibm-ompi avatar Jun 28 '22 17:06 ibm-ompi

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/b86ae4eaf9707b7e9b79a1deed491c2e

ibm-ompi avatar Jun 28 '22 17:06 ibm-ompi

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/d96b8bf838baa5c99c575572979ac2c8

ibm-ompi avatar Jun 28 '22 17:06 ibm-ompi

Missed a merge conflict. Rebuilding locally to identify any other missed conflicts before correcting.

cchambreau avatar Jun 28 '22 17:06 cchambreau

It looks like the rebase went wrong and instead picked up a merge commit (and a bunch of unrelated commits).

I'm not sure what is the easiest way to sort this out. Possibly git format-patch HASH to extract your commits from the branch, then hard rebase on main, then git am patchfile to bring back in your changes. Basically creating a new branch in place with just your changes - a manual rebase.

jjhursey avatar Jun 28 '22 17:06 jjhursey

FYI: git rebase does not play well with merge commits in the history. It looks like there's at least one merge commit in here, and then a "Rebase with main" commit -- I'm not sure what that is. At this point, you might want to follow Josh's advice and make a new branch (with the same topic/mpi_t_events name) from main HEAD and manually replay your commits on that branch. Then force push back up to your fork, and the PR should update itself with the new state of your branch.

jsquyres avatar Jun 28 '22 19:06 jsquyres

@jsquyres OK, will look into that.

cchambreau avatar Jun 28 '22 19:06 cchambreau