popcorn-kernel icon indicating copy to clipboard operation
popcorn-kernel copied to clipboard

Application remote workers are not cleaned up correctly for OpenMP applications

Open rlyerly opened this issue 6 years ago • 1 comments

When running OpenMP applications, threads successfully migrate to and from remote nodes, but after the application finishes remote workers are left running.

To replicate, run the attached binary on multiple nodes. For example:

$ ./kmeans -n 2 -t 4 # Run on 2 nodes with 4 threads

On the remote node, after the application finishes if you cat /proc/popcorn_ps you'll see the remote worker (with no other application threads) still alive, and running top shows the remote worker spinning.

Experienced using commit ead581a591b4aa1cd268a6d66e072ea029c4ec3b on the master branch.

kmeans.tar.gz

rlyerly avatar May 10 '18 14:05 rlyerly

It seems like this is a race condition when bulk-migrating lots of threads back to the origin -- when kernel printing is turned on for thread migrations, the issue seems to disappear (likely because execution becomes ordered when printing to the kernel log).

rlyerly avatar May 23 '18 14:05 rlyerly