charm icon indicating copy to clipboard operation
charm copied to clipboard

ckmulticast and any time migration

Open pplimport opened this issue 12 years ago • 4 comments

Original author: Yanhua Sun Original issue: https://charm.cs.illinois.edu/redmine/issues/120


When we are working on intra-node any time migration, we found leanmd hangs because some multicast reduction msg is not received. I created a simple example to test it. To reproduce it, check out branch intra-node-balancing. (build charm smp) example is here: charmgit:benchmarks/intra-node-balance multicast_reduction

./charmrun +p4 ./multicast +ppn 4

pplimport avatar Mar 19 '13 02:03 pplimport

Original date: 2014-01-08 00:38:18


The test charm++/delegation/multicast fails under randomized queueing (#259) because ckmulticast can't support anytime migration or reduction-before-broadcast. If it goes away (#324) then this is irrelevant.

PhilMiller avatar Apr 24 '19 20:04 PhilMiller

Original date: 2014-01-08 00:41:27


On Wed, Mar 13, 2013 at 4:09 PM, Yanhua Sun <sun51`illinois.edu> wrote:

I have a simple multicast example to test whether multicast works with any time migration. I run it in SMP with 1 process 2 threads. It hangs with migration.

I am wondering whether any one knows what happens for multicast with any time migration.

On Thu, Mar 14, 2013 at 1:37 PM, Sun, Yanhua <sun51`illinois.edu> wrote:

After more debugging, I found multicast (without reduction) works with any time migration.

Multicast +single reduction works. However, if I have three reductions for the same multicast manager, it either hangs or crashes, with following errors:

[0] Assertion "entry->rootSid.get_pe() != CkMyPe() || entry->rootSid.get_val() != entry" failed in file ckmulticast.C line 1274.

also here

[0] Assertion "info.type==1" failed in file ../../../../bin/../include/cksection.h line 115.

On Thu, Mar 14, 2013 at 8:37 PM, Gengbin Zheng <gzheng`illinois.edu> wrote:

One thing to note is that cksection reduction can not be invoked consecutively. One reduction must followed by a multicast. This might by a flaw in my implementation due to support for any time migration. Somehow the sectionID held by the users need to be refreshed by the new information due to a new multicast tree is re-constructed.

PhilMiller avatar Apr 24 '19 20:04 PhilMiller

Original date: 2015-02-09 20:01:58


This "bug" may just be limitation of CkMulticast: Gengbin "cksection reduction can not be invoked consecutively",

lifflander avatar Apr 24 '19 20:04 lifflander

Original date: 2015-10-26 19:51:36


The bug in delegation/multicast (#259) can be reproduced even without consecutive invocations of contribute. The bug is tied into anytime migration. If you contribute and then migrate, you can tickle that bug.

ericjbohm avatar Apr 24 '19 20:04 ericjbohm