chapel icon indicating copy to clipboard operation
chapel copied to clipboard

[Bug]: strange behavior in zippered loop expressions with distributed and DR domains

Open jeremiah-corrado opened this issue 1 year ago • 6 comments

Summary of Problem

"array index out of bounds" errors can arise incorrectly in zippered iterator expressions, when an index from a non-distributed domain is used to index into a distributed domain in the loop body — where the array being indexed into does not have the same size/distribution as the domain used as the leader iterator. This pattern can also create incorrect results when checks are not enabled.

Steps to Reproduce

use BlockDist;

var A: [blockDist.createDomain({1..100})] int = 5;

var db = {1..50};
var B: [blockDist.createDomain(db)] int = 1;

forall (b, i) in zip(B, db) {
  b = A[i];
}

writeln(B);

Compiling and running the above program with checks enabled and CHPL_COMM != none, produces the following error where the reported array bounds are smaller than the array's actual domain (1..100). I've only observed this when running on 2 locales.

asdf.chpl:9: error: halt reached - array index out of bounds
note: index was 26 but array bounds are 51..100

This message is referring to the A[i] access. The exact out-of-bounds index can vary from run to run depending on which task runs first on the second locale.

If the program is compiled with --fast or --no-checks, it produces the following incorrect result (where the entire smaller array should be set to 5):

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Using the above pattern should potentially be discouraged for performance reasons (i.e., forall (b, i) in zip(B, B.domain) probably makes more sense). In that case, a warning about not using it may be appropriate (see related issue: https://github.com/chapel-lang/chapel/issues/25707).

Configuration Information

printchplenv...
CHPL_HOST_PLATFORM: darwin
CHPL_HOST_COMPILER: clang
  CHPL_HOST_CC: clang
  CHPL_HOST_CXX: clang++
CHPL_HOST_ARCH: arm64
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: clang *
  CHPL_TARGET_CC: clang
  CHPL_TARGET_CXX: clang++
  CHPL_TARGET_LD: /usr/bin/clang++
CHPL_TARGET_ARCH: arm64
CHPL_TARGET_CPU: unknown
CHPL_LOCALE_MODEL: flat *
CHPL_COMM: gasnet *
  CHPL_COMM_SUBSTRATE: smp *
  CHPL_GASNET_SEGMENT: fast
  CHPL_GASNET_VERSION: ex
CHPL_TASKS: qthreads *
CHPL_LAUNCHER: smp
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_HOST_MEM: cstdlib *
CHPL_MEM: jemalloc *
  CHPL_TARGET_JEMALLOC: bundled
CHPL_ATOMICS: cstdlib
  CHPL_NETWORK_ATOMICS: none
CHPL_GMP: bundled *
CHPL_HWLOC: bundled
CHPL_RE2: bundled *
CHPL_LLVM: system *
  CHPL_LLVM_SUPPORT: system
  CHPL_LLVM_CONFIG: /opt/homebrew/opt/llvm@17/bin/llvm-config
  CHPL_LLVM_VERSION: 17 *
CHPL_AUX_FILESYS: none
CHPL_LIB_PIC: none
CHPL_SANITIZE: none
CHPL_SANITIZE_EXE: none

jeremiah-corrado avatar Aug 07 '24 20:08 jeremiah-corrado

It looks to me like this is likely due to the --[dynamic]-auto-local-access optimization, which is on by default, as turning off either of those optimizations with --no-dynamic-auto-local-access or --no-auto-local-access seems to cause the program to work. Tagging @e-kayrakli (who implemented the optimization) to confirm and in hopes that the fix is an easy one.

bradcray avatar Aug 12 '24 22:08 bradcray

The pattern is not a common one, but it is not an excuse for an obvious bug :)

diff --git a/compiler/optimizations/forallOptimizations.cpp b/compiler/optimizations/forallOptimizations.cpp
index 10d902ccb0..6de2f4965d 100644
--- a/compiler/optimizations/forallOptimizations.cpp
+++ b/compiler/optimizations/forallOptimizations.cpp
@@ -1247,18 +1247,20 @@ static void optimizeLoop(ForallStmt *forall,
   std::vector< std::pair<CallExpr *, int> >::iterator it;
   for(it = candidates.begin() ; it != candidates.end() ; it++) {
     CallExpr *candidate = it->first;
-    int iterandIdx = it->second;
+    //int iterandIdx = it->second;

     Symbol *checkSym = generateStaticCheckForAccess(candidate,
                                                     forall,
-                                                    iterandIdx,
+                                                    //iterandIdx,
+                                                    0,
                                                     staticCond);
     if (!doStatic) {
       forall->optInfo.staticCheckSymsForDynamicCandidates.push_back(checkSym);

       generateDynamicCheckForAccess(candidate,
                                     forall,
-                                    iterandIdx,
+                                    //iterandIdx,
+                                    0,
                                     dynamicCond);
     }

is the hack that fixes it. I am dropping this as a note for future. A real fix would almost certainly conflict with https://github.com/chapel-lang/chapel/pull/25712. I intend to get that merged this week. After that I can probably put a proper solution in.

My current thinking is that this is an oversight on my part. For some reason, our alignment check for an array here (A) was run against its corresponding iterand (db) instead of the loop's leader domain (B.domain). Technically A.domain and db are aligned in this context to support:

coforall l in Locales do on l {
  forall idx in Arr.localSubdomain() { // note that `db` is `A`'s localSubdomain with nl2
    Arr[idx]
  }
}

We should check alignment between A.domain and B.domain instead, which should fail.

I only tested this fix with a handful of existing tests that I expect to exercise similar logic. There could be some other issues that I am not seeing at the moment, but as I said; reading the OP and the compiler code, it was relatively obvious what the issue was.

e-kayrakli avatar Aug 12 '24 23:08 e-kayrakli

A real fix would almost certainly conflict with https://github.com/chapel-lang/chapel/pull/25712. … After that I can probably put a proper solution in.

If there's any doubt that we can do both (focusing on "probably") I'd prefer that we prioritize fixing the bug first and then merge 25712 rather than the other way around. I.e., I wouldn't want to release (and promote) an extension to the optimization when the optimization itself had a known flaw (that presumably the extension could be sensitive to as well?).

The pattern is not a common one

Sure, but it seems like any case in which the leader and an index-generating follower have different alignments could cause this problem, couldn't it?

bradcray avatar Aug 13 '24 00:08 bradcray

Sure, but it seems like any case in which the leader and an index-generating follower have different alignments could cause this problem, couldn't it?

Yes. Probably obvious but that generated index must be used in a A[i] where A is a distributed array.

In any case, I agree with your general assessment about priorities. I am confident enough to merge that PR as it is relatively large and likelier to go stale if I sit on it long and then put the fix in after.

e-kayrakli avatar Aug 13 '24 00:08 e-kayrakli

Thanks for taking a quick look!

bradcray avatar Aug 13 '24 00:08 bradcray

Almost moot with the PR auto-linked above, but

Sure, but it seems like any case in which the leader and an index-generating follower have different alignments could cause this problem, couldn't it?

Yes. Probably obvious but that generated index must be used in a A[i] where A is a distributed array.

Also, the index-generating follower must be contained within A's local subdomain. Jeremiah notes that

I've only observed this when running on 2 locales.

If you bump up number of locales, db becomes bigger than A's local subdomain, failing the dynamic check for optimization. When the optimization is thwarted, you get the correct behavior.

I am posting this mostly to have the correct record here and not to defend the implementation :) I am currently testing the fix, and if there's any big issues that I am not seeing I'll report back here.

e-kayrakli avatar Aug 13 '24 23:08 e-kayrakli