aomp
aomp copied to clipboard
report AOMP issues
For the repository https://github.com/zjin-lcf/oneAPI-DirectProgramming, the examples, which fail on an AMD GPU (gfx906), are listed below. The HIP version for each kernel may help when running both HIP and OMP programs for comparison.
Please type ‘make -f Makefile.aomp run’ in each program directory.
When results mismatch, they are not trivial (floating-point) differences. When there is memory access fault, it does not necessarily mean that there are bugs in my OMP programs. Thank you for your reviews and tests.
all-pairs-distance-omp (fixed)
results mismatch
asta-omp (fixed)
/home/usr/lib/aomp/lib/clang/13.0.0/include/__clang_hip_math.h:1342:11: error: declaration of anonymous class must be a definition
template <class T> __DEVICE__ T min(T __arg1, T __arg2) {
atomicIntrinsics-omp (enhancement)
support #pragma omp atomic compare
axhelm-omp (fixed) correctness check fails
./axhelm 1 8000 100 (fixed)
word size: 8 bytes
Correctness check: maxError = 1609.05
NRepetitions=100 Ndim=1 N=7 Nelements=8000 elapsed time=2.59332e+07 GDOF/s=0.10581 GB/s=13.8991 GFLOPS/s=18.6374
./axhelm 3 8000 100
word size: 8 bytes
Correctness check: maxError = 1634.36
NRepetitions=100 Ndim=3 N=7 Nelements=8000 elapsed time=5.61527e+07 GDOF/s=0.1466 GB/s=8.75327 GFLOPS/s=26.041
boxfilter-omp (fixed)
results mismatch
ced-omp (fixed)
results mismatch
compute-score-omp (fixed)
results mismatch
convolutionSeparable-omp
./main
[GPU Memory Error] Addr: 0x7f1f9b400000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x21170e0) on address 0x7f1f9b400000. Reason: Page not present or supervisor privilege.
crc64-omp
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_DEBUG=4 to dump host-target pointer mappings.
Libomptarget error: Source location information not present. Compile with -g or -gline-tables-only.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory
d2q9-bgk-omp (fixed)
results mismatch
python check/check.py --ref-av-vels-file=./check/256x256.av_vels.dat \
--ref-final-state-file=./check/256x256.final_state.dat \
--av-vels-file=./av_vels.dat \
--final-state-file=./final_state.dat
check/check.py:80: RuntimeWarning: invalid value encountered in divide
diff_pcnt = 100.0*(diff/(ref_vals - diff))
Total difference in av_vels : INF
Biggest difference (at step 0) : INF
-INF vs. 5.448322099360E-06 = nan%
()
Total difference in final_state : 2.900517986625E+00
Biggest difference (at coord (1,254)) : -4.527248199000E-05
3.327693790197E-02 vs. 3.323166541998E-02 = -0.14%
()
av_vels failed check
dct8x8-omp
results mismatch
dxtc1-omp (fixed)
results mismatch
fft-omp (fixed)
results mismatch
filter-omp (fixed)
[GPU Memory Error] Addr: 0x7f787ea0c000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0xb690e0) on address 0x7f787ea0c000. Reason: Page not present or supervisor privilege.
fpc-omp
results mismatch
gmm-omp
Starting with 2 cluster(s), will stop at 1 cluster(s).
[/home/release/git/aomp13/llvm-project/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp:277] GPU error in queue 0x7fabca0a0000 4111 (HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid.)
histogram-omp (fixed)
LLVM ERROR: Cannot select: t12: i32,ch = AtomicLoadAdd<(load store monotonic 4 on %ir.arrayidx30.i.us.i, addrspace 5)> t53:1, t10, Constant:i32<1>
t10: i32 = add FrameIndex:i32<0>, t9
t7: i32 = FrameIndex<0>
t9: i32 = shl t53, Constant:i32<2>
t53: i32,ch = load<(load 1 from %ir.lsr.iv11, !tbaa !61, !noalias !78), zext from i8> t0, t2, undef:i64
t2: i64,ch = CopyFromReg t0, Register:i64 %185
t1: i64 = Register %185
t4: i64 = undef
t8: i32 = Constant<2>
t11: i32 = Constant<1>
In function: __omp_offloading_2c_6ba2790__Z16run_smem_atomicsILi1ELi256EhEdPT1_iiPjb_l56
hybridsort-omp (fixed)
LLVM ERROR: Cannot select: t25: i32,ch = AtomicLoadAdd<(load store monotonic 4 on %ir.arrayidx18.i.i, addrspace 5)> t82:1, t23, Constant:i32<1>
t23: i32 = add FrameIndex:i32<0>, t22
t20: i32 = FrameIndex<0>
t22: i32 = shl t19, Constant:i32<2>
t19: i32 = or t16, t18
t16: i32 = and t14, Constant:i32<1023>
t14: i32 = fp_to_uint t13
t13: f32 = fmul t81, ConstantFP:f32<1.024000e+03>
t81: f32 = DIV_FIXUP nofpexcept t80, t10, t8
t80: f32 = DIV_FMAS nofpexcept t79, t75, t78, t70:1
t79: f32 = fma nofpexcept t72, t78, t70
...
knn-omp (fixed)
results mismatch
lanczos-omp (fixed)
results mismatch
medianfilter-omp (fixed)
results mismatch
nms-omp (fixed)
results mismatch
nw-omp (fixed)
segment fault
pathfinder-omp (fixed)
[GPU Memory Error] Addr: 0x7f01a807a000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x16540f0) on address 0x7f01a807a000. Reason: Page not present or supervisor privilege.
particlefilter-omp (fixed)
values are "inf" in output.txt
particles-omp
hanging
quicksort-omp
[GPU Memory Error] Addr: 0x7f166815f000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0xdf00e0) on address 0x7f166815f000. Reason: Page not present or supervisor privilege.
radixsort-omp
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
recursiveGaussian-omp (fixed)
results mismatch
reverse-omp (fixed)
results mismatch
scan-omp (fixed)
results mismatch
sobol-omp (fixed)
results mismatch
sort-omp (fixed)
[GPU Memory Error] Addr: 0x7fee955af000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x16340e0) on address 0x7fee955af000. Reason: Page not present or supervisor privilege.
split-omp (fixed)
[GPU Memory Error] Addr: 0x7fa9e2083000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x18ba0e0) on address 0x7fa9e2083000. Reason: Page not present or supervisor privilege.
streamcluster-omp
results mismatch (the output file is output.txt)
transpose-omp (fixed)
results mismatch
tridiagonal-omp (fixed)
results mismatch (the results are “inf”)
metropolis-omp
clang-13: /home/release/git/aomp13/llvm-project/llvm/lib/IR/Constants.cpp:2468: static llvm::Constant* llvm::ConstantExpr::getICmp(short unsigned int, llvm::Constant*, llvm::Constant*, bool): Assertion `LHS->getType() == RHS->getType()' failed.
minimod
./main --grid 100 --nsteps 1000
[/home/release/git/aomp13/llvm-project/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp:277] GPU error in queue 0x7f6b0d1d6000 4111 (HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid.)
sobel-omp
hanging
epistatis-omp
hanging
scan2-omp
./main 100 33554432 256
Executing kernel for 100 iterations
-------------------------------------------
Failed
vmc-omp
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
bonds-omp
Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
Libomptarget error: Source location information not present. Compile with -g or -gline-tables-only.
reaction-omp (do not match the expected output)
Components A | B
Min = 0.045745 | 0.000000
Max = 1.000006 | 0.000000
Ethan will review with OpenMP on LLVM team.
Okay. Please advise when my OMP programs are not written correctly. Thanks.
Will provide an update using AOMP_13.0-3 dev. There is a fix in place from trunk that corrects many of the result mismatches.
I look forward to AOMP_13.0-3 dev.
This is based on aomp_13.0-3 dev which is a preview of the next release. Some of these programs had no output (not sure if this means pass or fail). Most of the inputs I tried were not chosen for any specific reason. If you have suggested inputs, let me know.
all-pairs-distance-omp
PASS
PASS
asta-omp
/__clang_hip_math.h:1325:11: error: declaration of anonymous class must be a definition
atomicIntrinsics-omp
no output - pass?
axhelm-omp
./axhelm 1 8000 100
Correctness check: maxError = 0.000366211
./axhelm 3 8000 100
Correctness check: maxError = 0.000488281
Is this a pass?
boxfilter-omp
PASS
ced-omp
Test Passed
compute-score-omp
Verification: PASS
convolutionSeparable-omp
Memory access fault
crc64-omp
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
Libomptarget error: Source location information not present. Compile with -g or -gline-tables-only.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory
d2q9-bgk-omp
./main Inputs/input_256x256.params Obstacles/obstacles_256x256.dat
==done==
Reynolds number: 1.006634616852E+01
Elapsed time: 11.479089 (s)
*s this a pass?
dct8x8-omp
FAIL
dxtc1-omp
main: main.cpp:47: int main(int, char **): Assertion `image_path != NULL' failed.
Aborted (core dumped)
fft-omp
Segmentation fault (core dumped)
filter-omp
Filter using shared memory PASSED
fpc-omp
Segmentation fault (core dumped)
gmm-omp
./main 1 data out 1
GPU error in queue 0x7f021eb76000 4111 (HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid.)
Aborted (core dumped)
histogram-omp
PASS
hybridsort-omp
Segmentation fault (core dumped)
knn-omp
Precision accuracy 1.000000
Index accuracy 1.000000
Is this a pass?
lanczos-omp
./main -g data/gengraph.py -n 1 -k 1
nan/-nan
medianfilter-omp
PASS
nms-omp (Not sure what arguments are needed here, no input file seen in directory.)
./main
Usage: nmstest <detections.txt> <output.txt>
detections.txt -> Input file containing the coordinates, width, and scores of detected objects
output.txt -> Output file after performing NMS
nw-omp
./nw 16 1
WG size of kernel = 16
Device offloading time = 0.356437(s)
Is this a pass?
pathfinder-omp
./main 10 10 10
Device offloading time = 0.354400(s)
Is this a pass?
particlefilter-omp
./main -x 10 -y 10 -z 10 -np 100
VIDEO SEQUENCE TOOK 0.000084
Device offloading time: 0.363332 (s)
PARTICLE FILTER TOOK 0.363475
ENTIRE PROGRAM TOOK 0.363559
Is this a pass?
particles-omp
Segmentation fault (core dumped)
quicksort-omp
hangs on compilation during llc step
radixsort-omp
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
main.cpp:61:1: Libomptarget fatal error 1: failure of target construct while offloading is mandatory
recursiveGaussian-omp
Segmentation fault (core dumped)
reverse-omp
no output - pass?
scan-omp
PASS
sobol-omp
Segmentation fault (core dumped)
sort-omp
./main 1000 2
Segmentation fault (core dumped)
split-omp
main.cpp:16:10: fatal error: verify.cpp: No such file or directory
#include "verify.cpp"
streamcluster-omp
./streamcluster 100 100 2 100 10 10 output.txt 1
Segmentation fault (core dumped)
transpose-omp
no output - pass?
tridiagonal-omp
pcr_small_systems_kernel
looping 100 times..
Tridiagonal-pcrsmall-base, Throughput = 7023.1347 Systems/s, Time = 0.00233 s, Size = 16384 Systems
err = 0.6758
pcr_branch_free_kernel
looping 100 times..
Tridiagonal-pcrsmall-optimized, Throughput = 8100.3781 Systems/s, Time = 0.00202 s, Size = 16384 Systems
err = 0.6758
cyclic_small_systems_kernel
looping 100 times..
Tridiagonal-cyclicsmall-base, Throughput = 8041.1464 Systems/s, Time = 0.00204 s, Size = 16384 Systems
err = 0.3294
cyclic_branch_free_kernel
looping 100 times..
Tridiagonal-cyclicsmall-optimized, Throughput = 9650.4734 Systems/s, Time = 0.00170 s, Size = 16384 Systems
err = 0.3294
sweep_small_systems_global_kernel
looping 100 times..
Tridiagonal-sweepsmall-noreorder, Throughput = 5872.9074 Systems/s, Time = 0.00279 s, Size = 16384 Systems
err = 0.3507
sweep_data_reorder_kernel
sweep_small_systems_global_kernel
looping 100 times..
Tridiagonal-sweepsmall-reorder, Throughput = 2410.4358 Systems/s, Time = 0.00680 s, Size = 16384 Systems
err = 0.3507
Is this a pass?
Sorry for the confusion because verification is not fully automated. ASAP I will update some of the examples to produce pass or fail message. For other examples, I compared the HIP, OMP, and CUDA results. I will run OMP examples that produce segfault in your list on Intel and Nvidia GPUs again.
For atomicIntrinsics, axhelm, knn, reverse, transpose, tridiagonal examples, they pass the test. For nw, pathfinder, and particlerfilter, the OMP and HIP results match using the latest release. For nms, the input file, which is reused by the implementations, is located in the 'nms-cuda' folder. You might type "make -f Makefile.aomp run" for the test. The OMP and HIP results match. For dxtc1, the input files, which are also reused, are located in the 'dxtc1-sycl/data' folder. You might type "make -f Makefile.aomp run" for the test. The example passes the test. For d2q9-bgk, 'make check' will compare the device and host results. The example passes the test.
I updated a few examples to produce message clearly. Thanks.