aomp report AOMP issues

For the repository https://github.com/zjin-lcf/oneAPI-DirectProgramming, the examples, which fail on an AMD GPU (gfx906), are listed below. The HIP version for each kernel may help when running both HIP and OMP programs for comparison.
Please type ‘make -f Makefile.aomp run’ in each program directory. When results mismatch, they are not trivial (floating-point) differences. When there is memory access fault, it does not necessarily mean that there are bugs in my OMP programs. Thank you for your reviews and tests.

all-pairs-distance-omp (fixed)

results mismatch

asta-omp (fixed)

/home/usr/lib/aomp/lib/clang/13.0.0/include/__clang_hip_math.h:1342:11: error: declaration of anonymous class must be a definition
template <class T> __DEVICE__ T min(T __arg1, T __arg2) {

atomicIntrinsics-omp (enhancement)

support #pragma omp atomic compare

axhelm-omp (fixed) correctness check fails

./axhelm 1 8000 100 (fixed)
word size: 8 bytes
Correctness check: maxError = 1609.05
 NRepetitions=100 Ndim=1 N=7 Nelements=8000 elapsed time=2.59332e+07 GDOF/s=0.10581 GB/s=13.8991 GFLOPS/s=18.6374
./axhelm 3 8000 100
word size: 8 bytes
Correctness check: maxError = 1634.36
 NRepetitions=100 Ndim=3 N=7 Nelements=8000 elapsed time=5.61527e+07 GDOF/s=0.1466 GB/s=8.75327 GFLOPS/s=26.041

boxfilter-omp (fixed)

results mismatch

ced-omp (fixed)

results mismatch

compute-score-omp (fixed)

results mismatch

convolutionSeparable-omp

./main
[GPU Memory Error] Addr: 0x7f1f9b400000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x21170e0) on address 0x7f1f9b400000. Reason: Page not present or supervisor privilege.

crc64-omp

Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_DEBUG=4 to dump host-target pointer mappings.
Libomptarget error: Source location information not present. Compile with -g or -gline-tables-only.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory

d2q9-bgk-omp (fixed)

results mismatch

python check/check.py --ref-av-vels-file=./check/256x256.av_vels.dat \
        --ref-final-state-file=./check/256x256.final_state.dat \
        --av-vels-file=./av_vels.dat \
        --final-state-file=./final_state.dat
check/check.py:80: RuntimeWarning: invalid value encountered in divide
  diff_pcnt = 100.0*(diff/(ref_vals - diff))
Total difference in av_vels : INF
Biggest difference (at step 0) : INF
  -INF vs. 5.448322099360E-06 = nan%
()
Total difference in final_state : 2.900517986625E+00
Biggest difference (at coord (1,254)) : -4.527248199000E-05
  3.327693790197E-02 vs. 3.323166541998E-02 = -0.14%
()
av_vels failed check

dct8x8-omp

results mismatch

dxtc1-omp (fixed)

results mismatch

fft-omp (fixed)

results mismatch

filter-omp (fixed)

[GPU Memory Error] Addr: 0x7f787ea0c000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0xb690e0) on address 0x7f787ea0c000. Reason: Page not present or supervisor privilege.

fpc-omp

results mismatch

gmm-omp

Starting with 2 cluster(s), will stop at 1 cluster(s).
[/home/release/git/aomp13/llvm-project/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp:277] GPU error in queue 0x7fabca0a0000 4111 (HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid.)

histogram-omp (fixed)

LLVM ERROR: Cannot select: t12: i32,ch = AtomicLoadAdd<(load store monotonic 4 on %ir.arrayidx30.i.us.i, addrspace 5)> t53:1, t10, Constant:i32<1>
  t10: i32 = add FrameIndex:i32<0>, t9
    t7: i32 = FrameIndex<0>
    t9: i32 = shl t53, Constant:i32<2>
      t53: i32,ch = load<(load 1 from %ir.lsr.iv11, !tbaa !61, !noalias !78), zext from i8> t0, t2, undef:i64
        t2: i64,ch = CopyFromReg t0, Register:i64 %185
          t1: i64 = Register %185
        t4: i64 = undef
      t8: i32 = Constant<2>
  t11: i32 = Constant<1>
In function: __omp_offloading_2c_6ba2790__Z16run_smem_atomicsILi1ELi256EhEdPT1_iiPjb_l56

hybridsort-omp (fixed)

LLVM ERROR: Cannot select: t25: i32,ch = AtomicLoadAdd<(load store monotonic 4 on %ir.arrayidx18.i.i, addrspace 5)> t82:1, t23, Constant:i32<1>
  t23: i32 = add FrameIndex:i32<0>, t22
    t20: i32 = FrameIndex<0>
    t22: i32 = shl t19, Constant:i32<2>
      t19: i32 = or t16, t18
        t16: i32 = and t14, Constant:i32<1023>
          t14: i32 = fp_to_uint t13
            t13: f32 = fmul t81, ConstantFP:f32<1.024000e+03>
              t81: f32 = DIV_FIXUP nofpexcept t80, t10, t8
                t80: f32 = DIV_FMAS nofpexcept t79, t75, t78, t70:1
                  t79: f32 = fma nofpexcept t72, t78, t70
  ...

knn-omp (fixed)

results mismatch

lanczos-omp (fixed)

results mismatch

medianfilter-omp (fixed)

results mismatch

nms-omp (fixed)

results mismatch

nw-omp (fixed)

segment fault

pathfinder-omp (fixed)

[GPU Memory Error] Addr: 0x7f01a807a000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x16540f0) on address 0x7f01a807a000. Reason: Page not present or supervisor privilege.

particlefilter-omp (fixed)

values are "inf" in output.txt

particles-omp

hanging

quicksort-omp

[GPU Memory Error] Addr: 0x7f166815f000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0xdf00e0) on address 0x7f166815f000. Reason: Page not present or supervisor privilege.

radixsort-omp

Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0

recursiveGaussian-omp (fixed)

results mismatch

reverse-omp (fixed)

results mismatch

scan-omp (fixed)

results mismatch

sobol-omp (fixed)

results mismatch

sort-omp (fixed)

[GPU Memory Error] Addr: 0x7fee955af000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x16340e0) on address 0x7fee955af000. Reason: Page not present or supervisor privilege.

split-omp (fixed)

[GPU Memory Error] Addr: 0x7fa9e2083000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x18ba0e0) on address 0x7fa9e2083000. Reason: Page not present or supervisor privilege.

streamcluster-omp

results mismatch  (the output file is output.txt)

transpose-omp (fixed)

results mismatch

tridiagonal-omp (fixed)

results mismatch (the results are “inf”)

metropolis-omp

clang-13: /home/release/git/aomp13/llvm-project/llvm/lib/IR/Constants.cpp:2468: static llvm::Constant* llvm::ConstantExpr::getICmp(short unsigned int, llvm::Constant*, llvm::Constant*, bool): Assertion `LHS->getType() == RHS->getType()' failed.

minimod

./main --grid 100 --nsteps 1000
[/home/release/git/aomp13/llvm-project/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp:277] GPU error in queue 0x7f6b0d1d6000 4111 (HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid.)

sobel-omp

hanging

epistatis-omp

hanging

scan2-omp

./main 100 33554432 256
Executing kernel for 100 iterations
-------------------------------------------
Failed

vmc-omp

Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0

bonds-omp

Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
Libomptarget error: Source location information not present. Compile with -g or -gline-tables-only.

reaction-omp (do not match the expected output)

  Components A | B
  Min =     0.045745 |     0.000000
  Max =     1.000006 |     0.000000

Apr 24 '21 16:04 zjin-lcf

Ethan will review with OpenMP on LLVM team.

Apr 26 '21 14:04 gregrodgers

Okay. Please advise when my OMP programs are not written correctly. Thanks.

Apr 26 '21 19:04 zjin-lcf

Will provide an update using AOMP_13.0-3 dev. There is a fix in place from trunk that corrects many of the result mismatches.

Jun 03 '21 18:06 estewart08

I look forward to AOMP_13.0-3 dev.

Jun 03 '21 19:06 zjin-lcf

This is based on aomp_13.0-3 dev which is a preview of the next release. Some of these programs had no output (not sure if this means pass or fail). Most of the inputs I tried were not chosen for any specific reason. If you have suggested inputs, let me know.

all-pairs-distance-omp

PASS
PASS

asta-omp

/__clang_hip_math.h:1325:11: error: declaration of anonymous class must be a definition

atomicIntrinsics-omp

no output - pass?

axhelm-omp

./axhelm 1 8000 100
Correctness check: maxError = 0.000366211
./axhelm 3 8000 100
Correctness check: maxError = 0.000488281

Is this a pass?

boxfilter-omp

PASS

ced-omp

Test Passed

compute-score-omp

Verification: PASS

convolutionSeparable-omp

Memory access fault

crc64-omp

Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
Libomptarget error: Source location information not present. Compile with -g or -gline-tables-only.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory

d2q9-bgk-omp

./main Inputs/input_256x256.params Obstacles/obstacles_256x256.dat
==done==
Reynolds number:                1.006634616852E+01
Elapsed time:                   11.479089 (s)

*s this a pass?

dct8x8-omp

FAIL

dxtc1-omp

main: main.cpp:47: int main(int, char **): Assertion `image_path != NULL' failed.
Aborted (core dumped)

fft-omp

Segmentation fault (core dumped)

filter-omp

Filter using shared memory PASSED

fpc-omp

Segmentation fault (core dumped)

gmm-omp

./main 1 data out 1
GPU error in queue 0x7f021eb76000 4111 (HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid.)
Aborted (core dumped)

histogram-omp

PASS

hybridsort-omp

Segmentation fault (core dumped)

knn-omp

Precision accuracy 1.000000
Index accuracy 1.000000

Is this a pass?

lanczos-omp

./main -g data/gengraph.py -n 1 -k 1
nan/-nan

medianfilter-omp

PASS

nms-omp (Not sure what arguments are needed here, no input file seen in directory.)

./main
Usage: nmstest  <detections.txt>  <output.txt>

               detections.txt -> Input file containing the coordinates, width, and scores of detected objects
               output.txt     -> Output file after performing NMS

nw-omp

./nw 16 1
WG size of kernel = 16
Device offloading time = 0.356437(s)

Is this a pass?

pathfinder-omp

./main 10 10 10
Device offloading time = 0.354400(s)

Is this a pass?

particlefilter-omp

./main -x 10 -y 10 -z 10 -np 100
VIDEO SEQUENCE TOOK 0.000084
Device offloading time: 0.363332 (s)
PARTICLE FILTER TOOK 0.363475
ENTIRE PROGRAM TOOK 0.363559

Is this a pass?

particles-omp

Segmentation fault (core dumped)

quicksort-omp

hangs on compilation during llc step

radixsort-omp

Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
main.cpp:61:1: Libomptarget fatal error 1: failure of target construct while offloading is mandatory

recursiveGaussian-omp

Segmentation fault (core dumped)

reverse-omp

no output - pass?

scan-omp

PASS

sobol-omp

Segmentation fault (core dumped)

sort-omp

./main 1000 2
Segmentation fault (core dumped)

split-omp

main.cpp:16:10: fatal error: verify.cpp: No such file or directory
 #include "verify.cpp"

streamcluster-omp

./streamcluster 100 100 2 100 10 10 output.txt 1
Segmentation fault (core dumped)

transpose-omp

no output - pass?

tridiagonal-omp

 pcr_small_systems_kernel
  looping 100 times..
Tridiagonal-pcrsmall-base, Throughput = 7023.1347 Systems/s, Time = 0.00233 s, Size = 16384 Systems
  err = 0.6758

 pcr_branch_free_kernel
  looping 100 times..
Tridiagonal-pcrsmall-optimized, Throughput = 8100.3781 Systems/s, Time = 0.00202 s, Size = 16384 Systems
  err = 0.6758

 cyclic_small_systems_kernel
  looping 100 times..
Tridiagonal-cyclicsmall-base, Throughput = 8041.1464 Systems/s, Time = 0.00204 s, Size = 16384 Systems
  err = 0.3294

 cyclic_branch_free_kernel
  looping 100 times..
Tridiagonal-cyclicsmall-optimized, Throughput = 9650.4734 Systems/s, Time = 0.00170 s, Size = 16384 Systems
  err = 0.3294

sweep_small_systems_global_kernel
  looping 100 times..
Tridiagonal-sweepsmall-noreorder, Throughput = 5872.9074 Systems/s, Time = 0.00279 s, Size = 16384 Systems
  err = 0.3507

sweep_data_reorder_kernel
sweep_small_systems_global_kernel
  looping 100 times..
Tridiagonal-sweepsmall-reorder, Throughput = 2410.4358 Systems/s, Time = 0.00680 s, Size = 16384 Systems
  err = 0.3507

Is this a pass?

Jun 03 '21 22:06 estewart08

Sorry for the confusion because verification is not fully automated. ASAP I will update some of the examples to produce pass or fail message. For other examples, I compared the HIP, OMP, and CUDA results. I will run OMP examples that produce segfault in your list on Intel and Nvidia GPUs again.

Jun 03 '21 22:06 zjin-lcf

For atomicIntrinsics, axhelm, knn, reverse, transpose, tridiagonal examples, they pass the test. For nw, pathfinder, and particlerfilter, the OMP and HIP results match using the latest release. For nms, the input file, which is reused by the implementations, is located in the 'nms-cuda' folder. You might type "make -f Makefile.aomp run" for the test. The OMP and HIP results match. For dxtc1, the input files, which are also reused, are located in the 'dxtc1-sycl/data' folder. You might type "make -f Makefile.aomp run" for the test. The example passes the test. For d2q9-bgk, 'make check' will compare the device and host results. The example passes the test.

I updated a few examples to produce message clearly. Thanks.

Jun 04 '21 12:06 zjin-lcf

aomp aomp copied to clipboard

report AOMP issues

aomp
aomp copied to clipboard