ckmulticast and any time migration
Original author: Yanhua Sun Original issue: https://charm.cs.illinois.edu/redmine/issues/120
When we are working on intra-node any time migration, we found leanmd hangs because some multicast reduction msg is not received. I created a simple example to test it. To reproduce it, check out branch intra-node-balancing. (build charm smp) example is here: charmgit:benchmarks/intra-node-balance multicast_reduction
./charmrun +p4 ./multicast +ppn 4
Original date: 2014-01-08 00:38:18
The test charm++/delegation/multicast fails under randomized queueing (#259) because ckmulticast can't support anytime migration or reduction-before-broadcast. If it goes away (#324) then this is irrelevant.
Original date: 2014-01-08 00:41:27
On Wed, Mar 13, 2013 at 4:09 PM, Yanhua Sun <sun51`illinois.edu> wrote:
I have a simple multicast example to test whether multicast works with any time migration. I run it in SMP with 1 process 2 threads. It hangs with migration.
I am wondering whether any one knows what happens for multicast with any time migration.
On Thu, Mar 14, 2013 at 1:37 PM, Sun, Yanhua <sun51`illinois.edu> wrote:
After more debugging, I found multicast (without reduction) works with any time migration.
Multicast +single reduction works. However, if I have three reductions for the same multicast manager, it either hangs or crashes, with following errors:
[0] Assertion "entry->rootSid.get_pe() != CkMyPe() || entry->rootSid.get_val() != entry" failed in file ckmulticast.C line 1274.
also here
[0] Assertion "info.type==1" failed in file ../../../../bin/../include/cksection.h line 115.
On Thu, Mar 14, 2013 at 8:37 PM, Gengbin Zheng <gzheng`illinois.edu> wrote:
One thing to note is that cksection reduction can not be invoked consecutively. One reduction must followed by a multicast. This might by a flaw in my implementation due to support for any time migration. Somehow the sectionID held by the users need to be refreshed by the new information due to a new multicast tree is re-constructed.
Original date: 2015-02-09 20:01:58
This "bug" may just be limitation of CkMulticast: Gengbin "cksection reduction can not be invoked consecutively",
Original date: 2015-10-26 19:51:36
The bug in delegation/multicast (#259) can be reproduced even without consecutive invocations of contribute. The bug is tied into anytime migration. If you contribute and then migrate, you can tickle that bug.