chapel icon indicating copy to clipboard operation
chapel copied to clipboard

Should coforall intent impact a nested 'on' statement?

Open mppf opened this issue 9 months ago • 7 comments

In Chapel, the coforall + on pattern is common. For example

  coforall loc in Locales {
    on loc {
      // do something
    }
  }

I realized that I wasn't sure what would happen if you use a with (in x) clause on the coforall in that case. Will it create a local copy of x per locale? It would be convenient if it did; although perhaps that is baking in the behavior of merging coforall+on into nonblocking remote tasks.

The following program demonstrates this pattern and below I show the output. The behavior today is inconsistent:

  • with (in x) does in fact cause x to be local on each locale
  • BUT an array stored within a record is not copied in this case

Contrast with forall over a block domain, where both the x variable and the inner array are localized to the destination locale.

What should it do?

use BlockDist;

proc main() {
  var x: R;
  x.v = 1;

  writeln("coforall+on");
  coforall loc in Locales with (in x) {
    on loc {
      writeln("I'm on locale ", here.id, " with copy ", x,
              " on locale ", x.locale.id,
              " x.elts on locale ", x.elts[0].locale.id);
    }
  }

  const D = blockDist.createDomain(0..<numLocales); 
  writeln("forall over Block with 1 element per locale");
  forall locid in D with (in x) {
    writeln("I'm on locale ", here.id, " with copy ", x,
            " on locale ", x.locale.id,
            " x.elts on locale ", x.elts[0].locale.id);
  }
}

record R {
  var v: int;
  var elts: [0..10] int;
  proc init() {
    writeln("R.init() on ", here.id);
    v = 42;
  }
  proc init=(const ref rhs: R) {
    writeln("R.init=() on ", here.id, " rhs is on ", rhs.locale.id);
  }
  // unrelated issue: uncommenting this leads to compilation error
  /*operator =(ref lhs, const ref rhs) {
    writeln("R.= on ", here.id, " rhs is on ", rhs.locale.id);
  }*/
}

Compile and run

$ chpl aa.chpl  && ./aa -nl 3
R.init() on 0
coforall+on
R.init=() on 0 rhs is on 0
R.init=() on 0 rhs is on 0
I'm on locale 0 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 0 x.elts on locale 0
R.init=() on 0 rhs is on 0
I'm on locale 1 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 1 x.elts on locale 0
I'm on locale 2 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 2 x.elts on locale 0
forall over Block with 1 element per locale
R.init=() on 0 rhs is on 0
R.init=() on 0 rhs is on 0
R.init=() on 0 rhs is on 0
R.init=() on 2 rhs is on 2
R.init=() on 1 rhs is on 1
R.init=() on 0 rhs is on 0
R.init=() on 2 rhs is on 2
R.init=() on 1 rhs is on 1
I'm on locale 0 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 0 x.elts on locale 0
I'm on locale 1 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 1 x.elts on locale 1
I'm on locale 2 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 2 x.elts on locale 2

mppf avatar Feb 20 '25 23:02 mppf

Will it create a local copy of x per locale?

My guess is that, by a strict semantic interpretation of the language, it wouldn't and shouldn't (and that we should introduce on intents to guarantee this, as proposed in https://github.com/chapel-lang/chapel/issues/26752 and https://github.com/chapel-lang/chapel/issues/18585).

But I could imagine rvf firing, depending on what happens to x within and around the loop. For your example, x seems to be read-only within the loop making it seem like it'd plausibly be firing. Have you checked whether it is or tried disabling it to see whether the behavior is different? [note that I think rvf for a record with an array field should forward the array as well, but am not sure what behavior it might have today].

This also gets into the longstanding sticky question of what .locale on an RVF'd variable should return, or whether querying .locale should disable RVF to avoid confusion.

bradcray avatar Feb 20 '25 23:02 bradcray

Using --no-remote-value-forwarding does not change the inconsistency. (Neither does --fast, FWIW).

mppf avatar Feb 21 '25 00:02 mppf

Closing tabs, I realized I grabbed for the wrong issue above, and then was more surprised to find that maybe the issue I was thinking of didn't exist. Filed https://github.com/chapel-lang/chapel/issues/26752 to address this and updated my previous comment to update the incorrect link.

bradcray avatar Feb 21 '25 01:02 bradcray

@mppf: You've made me really curious about the source of the inconsistency given that the forall loop is implemented in terms of coforall and on... 🤔

bradcray avatar Feb 21 '25 01:02 bradcray

I have a pretty good guess why it's inconsistent. I think it's because the forall has two coforalls in the iterator, one for locales, (then the on statement), then a coforall for tasks. Both coforalls will get the in x intent behavior (to implement the forall intent). As a result, the inner coforall will copy x to the current locale.

mppf avatar Feb 21 '25 14:02 mppf

That's a great guess—sounds very likely to be the case to me, thanks!

bradcray avatar Feb 21 '25 17:02 bradcray

I was recently thinking about this in the context of the suffix sorting application. I realized I didn't know if or why the code was not making a copy on Locale 0 of the equivalent of x in the application (which is an array of splitters for partitioning / sorting).

I made this example

use BlockDist;

proc main() {
  var x: R;
  x.v = 1;

  writeln("coforall+on explicit");
  coforall loc in Locales {
    on loc {
      var myx = x;
      writeln("I'm on locale ", here.id, " with copy ", myx,
              " on locale ", myx.locale.id,
              " x.elts on locale ", myx.elts[0].locale.id);
    }
  }

  const D = blockDist.createDomain(0..1000);
  writeln("forall with divideByLocales");
  forall (activeLocIdx, locRegion)
  in divideByLocales(D, 0..<numLocales, Locales)
  with (in x) {
    writeln("I'm on locale ", here.id, " with copy ", x,
            " on locale ", x.locale.id,
            " x.elts on locale ", x.elts[0].locale.id);
  }
}

/* Given a Block distributed domain or non-distributed domain,
   this iterator divides it into per-locale chunks and processes
   each on its owning locale.

   yields (activeLocIdx, chunk)
*/
iter divideByLocales(const Dom: domain(?),
                     const region: range,
                     const ref activeLocales=Locales) {
  if Dom.rank != 1 then compilerError("divideByLocales only supports 1-D");
  if Dom.dim(0).strides != strideKind.one then
    compilerError("divideByLocales only supports non-strided domains");
  yield (0, Dom.dim(0));
  halt("serial divideByLocales should not be called");
}
iter divideByLocales(param tag: iterKind,
                     const Dom: domain(?),
                     const region: range,
                     const ref activeLocales=Locales)
 where tag == iterKind.standalone {

  if Dom.rank != 1 then compilerError("divideByLocales only supports 1-D");
  if Dom.dim(0).strides != strideKind.one then
    compilerError("divideByLocales only supports non-strided domains");
  if !Dom.hasSingleLocalSubdomain() {
    compilerError("divideByLocales only supports dists " +
                  "with single local subdomain");
    // note: it'd be possible to support; would just need to be written
    // differently, and consider both
    //  # local subdomains < nTasksPerLocale and the inverse.
  }

  coforall (loc, activeLocIdx) in zip(activeLocales, 0..) {
    on loc {
      const ref locDom = Dom.localSubdomain();
      const locRegion = locDom.dim(0)[region];
      yield (activeLocIdx, locRegion);
    }
  }
}


record R {
  var v: int;
  var elts: [0..10] int;
  proc init() {
    writeln("R.init() on ", here.id);
    v = 42;
  }
  proc init=(const ref rhs: R) {
    writeln("R.init=() on ", here.id, " rhs is on ", rhs.locale.id);
  }
  // unrelated issue: uncommenting this leads to compilation error
  /*operator =(ref lhs, const ref rhs) {
    writeln("R.= on ", here.id, " rhs is on ", rhs.locale.id);
  }*/
}

It appears to work to use forall with divideByLocales & in particular I'm not seeing numLocales init= calls on Locale 0, but I observed that the init= doesn't seem to be doing the communication (not sure what is), we see e.g. R.init=() on 2 rhs is on 2. I wonder if this causes it to be less optimized. I changed the application to use the coforall+on explicit pattern (which required adding a barrier) and I observed a 12% performance improvement.

mppf avatar Mar 17 '25 19:03 mppf