chipStar icon indicating copy to clipboard operation
chipStar copied to clipboard

hipConstantTestDeviceSymbol fails on Intel CPU

Open pjaaskel opened this issue 3 years ago • 6 comments

This test case uses device-side initialized global variables. It produces random output with the Intel CPU driver (#142).

@linehill suspected it's because we lack the clSetKernelExecInfo() calls to set the (generated) global pointers as SVM pointers (CL_KERNEL_EXEC_INFO_SVM_PTRS) which are not referred to in the kernel arg list, but are still used by the kernel.

The problem goes away when I print out the variables in the kernel, which indicates an overly eager optimizer converting the SVM-converted globals to program-scope globals or such, so @linehill might be right. It also works with the GPU driver, possibly just by luck of not optimizing the "SVM globals" away.

__constant__ __device__ int ConstOut = 123;
__constant__ __device__ int ConstIn = 321;

__global__ void Assign(int* Out) {
  int tid = threadIdx.x + blockIdx.x * blockDim.x;
  printf("ConstOut: %d\n", ConstOut); // I added this
  printf("ConstIn: %d\n", ConstIn); // ...and this
  if (tid == 0)
    Out[tid] = -ConstIn;
  printf("Out[tid]: %d\n", Out[tid]);  // ..and this
}

pjaaskel avatar Oct 25 '22 14:10 pjaaskel

Seems we lack all kind of SVM ptr info passing in the code base?

pjaaskel avatar Oct 25 '22 14:10 pjaaskel

still open?

pvelesko avatar May 04 '23 11:05 pvelesko

Does it fail still? Can we enable the test on Intel/PoCL CPU?

pjaaskel avatar May 04 '23 13:05 pjaaskel

yes still failing

test 69
    Start 69: hipConstantTestDeviceSymbol

69: Test command: /home/pvelesko/hipstar/hipstar/build/samples/hipSymbol/hipTestConstantDeviceSymbol
69: Test timeout computed to be: 10000000
69: hipTestConstantDeviceSymbol: /home/pvelesko/hipstar/hipstar/samples/hipSymbol/hipTestConstantDeviceSymbol.cpp:46: int main(): Assertion `Ch == -654' failed.
1/1 Test #69: hipConstantTestDeviceSymbol ......Subprocess aborted***Exception:   0.20 sec

0% tests passed, 1 tests failed out of 1

Label Time Summary:
internal    =   0.20 sec*proc (1 test)

Total Test time (real) =   0.57 sec

The following tests FAILED:
	 69 - hipConstantTestDeviceSymbol (Subprocess aborted)

pvelesko avatar May 04 '23 13:05 pvelesko

Still fails on Intel-CPU. PoCL-CPU works. Do we set the SVM markups for the global initializers correctly @linehill? Might be a driver bug too.

pjaaskel avatar May 29 '23 13:05 pjaaskel

Dropped from Milestone 1.0: resolution depends on external issue.

linehill avatar Jun 09 '23 07:06 linehill