chapel
chapel copied to clipboard
Should coforall intent impact a nested 'on' statement?
In Chapel, the coforall + on pattern is common. For example
coforall loc in Locales {
on loc {
// do something
}
}
I realized that I wasn't sure what would happen if you use a with (in x) clause on the coforall in that case. Will it create a local copy of x per locale? It would be convenient if it did; although perhaps that is baking in the behavior of merging coforall+on into nonblocking remote tasks.
The following program demonstrates this pattern and below I show the output. The behavior today is inconsistent:
with (in x)does in fact causexto be local on each locale- BUT an array stored within a record is not copied in this case
Contrast with forall over a block domain, where both the x variable and the inner array are localized to the destination locale.
What should it do?
use BlockDist;
proc main() {
var x: R;
x.v = 1;
writeln("coforall+on");
coforall loc in Locales with (in x) {
on loc {
writeln("I'm on locale ", here.id, " with copy ", x,
" on locale ", x.locale.id,
" x.elts on locale ", x.elts[0].locale.id);
}
}
const D = blockDist.createDomain(0..<numLocales);
writeln("forall over Block with 1 element per locale");
forall locid in D with (in x) {
writeln("I'm on locale ", here.id, " with copy ", x,
" on locale ", x.locale.id,
" x.elts on locale ", x.elts[0].locale.id);
}
}
record R {
var v: int;
var elts: [0..10] int;
proc init() {
writeln("R.init() on ", here.id);
v = 42;
}
proc init=(const ref rhs: R) {
writeln("R.init=() on ", here.id, " rhs is on ", rhs.locale.id);
}
// unrelated issue: uncommenting this leads to compilation error
/*operator =(ref lhs, const ref rhs) {
writeln("R.= on ", here.id, " rhs is on ", rhs.locale.id);
}*/
}
Compile and run
$ chpl aa.chpl && ./aa -nl 3
R.init() on 0
coforall+on
R.init=() on 0 rhs is on 0
R.init=() on 0 rhs is on 0
I'm on locale 0 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 0 x.elts on locale 0
R.init=() on 0 rhs is on 0
I'm on locale 1 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 1 x.elts on locale 0
I'm on locale 2 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 2 x.elts on locale 0
forall over Block with 1 element per locale
R.init=() on 0 rhs is on 0
R.init=() on 0 rhs is on 0
R.init=() on 0 rhs is on 0
R.init=() on 2 rhs is on 2
R.init=() on 1 rhs is on 1
R.init=() on 0 rhs is on 0
R.init=() on 2 rhs is on 2
R.init=() on 1 rhs is on 1
I'm on locale 0 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 0 x.elts on locale 0
I'm on locale 1 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 1 x.elts on locale 1
I'm on locale 2 with copy (v = 0, elts = 0 0 0 0 0 0 0 0 0 0 0) on locale 2 x.elts on locale 2
Will it create a local copy of x per locale?
My guess is that, by a strict semantic interpretation of the language, it wouldn't and shouldn't (and that we should introduce on intents to guarantee this, as proposed in https://github.com/chapel-lang/chapel/issues/26752 and https://github.com/chapel-lang/chapel/issues/18585).
But I could imagine rvf firing, depending on what happens to x within and around the loop. For your example, x seems to be read-only within the loop making it seem like it'd plausibly be firing. Have you checked whether it is or tried disabling it to see whether the behavior is different? [note that I think rvf for a record with an array field should forward the array as well, but am not sure what behavior it might have today].
This also gets into the longstanding sticky question of what .locale on an RVF'd variable should return, or whether querying .locale should disable RVF to avoid confusion.
Using --no-remote-value-forwarding does not change the inconsistency. (Neither does --fast, FWIW).
Closing tabs, I realized I grabbed for the wrong issue above, and then was more surprised to find that maybe the issue I was thinking of didn't exist. Filed https://github.com/chapel-lang/chapel/issues/26752 to address this and updated my previous comment to update the incorrect link.
@mppf: You've made me really curious about the source of the inconsistency given that the forall loop is implemented in terms of coforall and on... 🤔
I have a pretty good guess why it's inconsistent. I think it's because the forall has two coforalls in the iterator, one for locales, (then the on statement), then a coforall for tasks. Both coforalls will get the in x intent behavior (to implement the forall intent). As a result, the inner coforall will copy x to the current locale.
That's a great guess—sounds very likely to be the case to me, thanks!
I was recently thinking about this in the context of the suffix sorting application. I realized I didn't know if or why the code was not making a copy on Locale 0 of the equivalent of x in the application (which is an array of splitters for partitioning / sorting).
I made this example
use BlockDist;
proc main() {
var x: R;
x.v = 1;
writeln("coforall+on explicit");
coforall loc in Locales {
on loc {
var myx = x;
writeln("I'm on locale ", here.id, " with copy ", myx,
" on locale ", myx.locale.id,
" x.elts on locale ", myx.elts[0].locale.id);
}
}
const D = blockDist.createDomain(0..1000);
writeln("forall with divideByLocales");
forall (activeLocIdx, locRegion)
in divideByLocales(D, 0..<numLocales, Locales)
with (in x) {
writeln("I'm on locale ", here.id, " with copy ", x,
" on locale ", x.locale.id,
" x.elts on locale ", x.elts[0].locale.id);
}
}
/* Given a Block distributed domain or non-distributed domain,
this iterator divides it into per-locale chunks and processes
each on its owning locale.
yields (activeLocIdx, chunk)
*/
iter divideByLocales(const Dom: domain(?),
const region: range,
const ref activeLocales=Locales) {
if Dom.rank != 1 then compilerError("divideByLocales only supports 1-D");
if Dom.dim(0).strides != strideKind.one then
compilerError("divideByLocales only supports non-strided domains");
yield (0, Dom.dim(0));
halt("serial divideByLocales should not be called");
}
iter divideByLocales(param tag: iterKind,
const Dom: domain(?),
const region: range,
const ref activeLocales=Locales)
where tag == iterKind.standalone {
if Dom.rank != 1 then compilerError("divideByLocales only supports 1-D");
if Dom.dim(0).strides != strideKind.one then
compilerError("divideByLocales only supports non-strided domains");
if !Dom.hasSingleLocalSubdomain() {
compilerError("divideByLocales only supports dists " +
"with single local subdomain");
// note: it'd be possible to support; would just need to be written
// differently, and consider both
// # local subdomains < nTasksPerLocale and the inverse.
}
coforall (loc, activeLocIdx) in zip(activeLocales, 0..) {
on loc {
const ref locDom = Dom.localSubdomain();
const locRegion = locDom.dim(0)[region];
yield (activeLocIdx, locRegion);
}
}
}
record R {
var v: int;
var elts: [0..10] int;
proc init() {
writeln("R.init() on ", here.id);
v = 42;
}
proc init=(const ref rhs: R) {
writeln("R.init=() on ", here.id, " rhs is on ", rhs.locale.id);
}
// unrelated issue: uncommenting this leads to compilation error
/*operator =(ref lhs, const ref rhs) {
writeln("R.= on ", here.id, " rhs is on ", rhs.locale.id);
}*/
}
It appears to work to use forall with divideByLocales & in particular I'm not seeing numLocales init= calls on Locale 0, but I observed that the init= doesn't seem to be doing the communication (not sure what is), we see e.g. R.init=() on 2 rhs is on 2. I wonder if this causes it to be less optimized. I changed the application to use the coforall+on explicit pattern (which required adding a barrier) and I observed a 12% performance improvement.