OpenACCV-V
OpenACCV-V copied to clipboard
Feedback from NVHPC
Hi. Thank you for the OpenACC V&V suite. I'm working on the NVHPC compiler and checked all the failures of nvhpc 23.1 ( https://crpl.cis.udel.edu/oaccvv/results/ ). I'd like to make some comments on the implementation to improve the quality of the suite.
This line should be IF (abs(c(x) - (a(x) + b(x))) .gt. PRECISION) THEN
.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/acc_copyin_with_len.F90#L299
DO x = 1, acc_get_num_devices(acc_get_device_type())
should be DO x = 0, acc_get_num_devices(acc_get_device_type())-1
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/acc_get_device_num.F90
This test seems to be running on a single-GPU environment.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/acc_memcpy_d2d.c
This length is too big to complete the test.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_capture_assign_expr_divided_x.F90
atomic_capture_assign_expr_minus_x has the same issue. Also, the atomic operation is not necessary because the innermost loop is not parallelized automatically.
atomic_capture_assign_expr_plus_x is missing acc_testsuite.Fh, so PRECISION is undefined.
The runtime failures from atomic_capture_assign_expr_plus_x to atomic_capture_x_plus_expr_assign.F90 are the same.
atomic_update_expr_divided_x.F90 and atomic_update_expr_minus_x.F90 fail due to the same reason. They need PRECISION defined, and length=10 is too big. length=5 works.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_update_expr_divided_x.F90
#define PRECISION 1e-8
is too restrictive to check the results of real_t. if (fabs(b[x] - (a[x] * prev)) < PRECISION){
:
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_structured_expr_multiply_x_assign.c
The tests from atomic_structured_expr_multiply_x_assign.c to atomic_structured_x_multiply_expr_assign.cpp fail due to the same reason.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_structured_x_multiply_expr_assign.cpp
kernels_if test3 is in the same situation.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/kernels_if.F90
parallel_loop_reduction_add_general_type_check_pt2 is also.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/parallel_loop_reduction_add_general_type_check_pt2.c
The same case:
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_firstprivate.F90
These tests should not depend on the order of the evaluation of the clauses. #pragma acc parallel loop copy(test[0:n]) copyout(test[0:n])
.
629 • A program must not depend on the order of evaluation of the clauses, or on any side effects
630 of the evaluations.
( https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdf )
1056 • A program must not depend on the order of evaluation of the clauses or on any side effects of
1057 the evaluations.
( https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf )
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/copy_copyout.c
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/copyin_copyout.c
parallel_create_zero has an incomplete line: #pragma acc data copyin(a[0:n]) copyout(b[0:n]
. (Also, we do not support zero modifiers yet.)
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/parallel_create_zero.c
The following tests are not complete.
This refers to undefined 'scalar'. A variable and types are redeclared.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_copyin.c
The Fortran version has similar issues.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_copyin.F90
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_create.F90
This one does not contain main, so should be excluded from the tests.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_copyin_mod.F90
This one also misses 'scalar'.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_device_resident.c
(In general, the C++ version should be symlinked to the C vertsion?)
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_device_resident.cpp
The tests from declare_function_scope_copy to declare_function_scope_present are missing the allocation of a, b, and c.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_function_scope_copy.c
This loop should start from 0.
for (int x = 1; x < n; ++x){
host_b = host_b | a[x];
}
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/kernels_loop_reduction_bitor_general.cpp#L34
This test is requesting gang-level reduction instead of loop-level reduction. I think it is not valid according to the spec.
1478 The reduction clause specifies a reduction operator and one or more scalar variables. For each
1479 reduction variable, a private copy is created in the same manner as for a private clause on the
1480 loop construct, and initialized for that operator; see the table in Section 2.5.12 reduction clause. At
1481 the end of the loop, the values for each thread are combined using the specified reduction operator,
1482 and the result combined with the value of the original variable and stored in the original variable at
1483 the end of the parallel or kernels region if the loop has gang parallelism, and at the end of the loop
1484 otherwise.
( https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdf )
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/kernels_num_gangs.F90
There is an incompatibility between OpenACC 2.6 and 3.2 regarding null copyout. NVHPC is supporting 2.6. The following test is assuming 3.2.
1225 – If var is not present on the current device, a runtime error is issued.
( https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdf )
1745 – If the appropriate reference counter for var is zero, no action is taken.
( https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf )
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/reference_count_zero.c
In this test, a lot of threads illegally call a gang routine from the parallel construct without loop.
2876 gang clause
2877 The gang clause specifies that the procedure contains, may contain, or may call another procedure
2878 that contains a loop with a gang clause. A call to this procedure must appear in code that is
2879 executed in gang-redundant mode, and all gangs must execute the call. For instance, a procedure
2880 with a routine gang directive may not be called from within a loop that has a gang clause.
2881 Only one of the gang, worker, vector and seq clauses may appear for each device type.
( https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf )
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/routine_gang.c
INTEGER:: multiplier
should be INTEGER:: multiplier = 1
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_gang_blocking.F90
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_vector_blocking.F90
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_worker_blocking.F90
b and host_b need initialization.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_reduction_bitor_general.F90
This needs REAL(8):: maxval = 0.0, host_max = 0.0
.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_reduction_max_general.F90
IF (a(x, y) .eq. 1) THEN
should be IF (a(x, y) .eq. .TRUE.) THEN
.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_reduction_and_loop.F90
The kernel needs present(a, b, d)
.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_private.c
IF (abs(d(x) - temp) .gt. (2 * PRECISION * LOOPCOUNT)) THEN
should be IF (abs(d(y) - temp) .gt. (2 * PRECISION * LOOPCOUNT)) THEN
.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_private.F90
This test seems to be incorrect.
With NVHPC:
int device_type = acc_get_device_type(); // 4
#pragma acc set device_type(host) // 2
if (acc_get_device_type() != device_type){ // 2 != 4
err += 1;
}
#pragma acc set device_type(default) → nvidia → 4 #pragma acc set device_type(host) → 2 #pragma acc set device_type(multicore) → 2
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/set_device_type.c
This one is missing INTEGER :: device_num
.
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/set_device_type_num.F90
wait_if is missing #ifndef T2
. (Also, we do not support wait if yet.)
https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/wait_if.c
We will refine our compilers to fix the rest of the failures. Thanks a lot.