TornadoVM
TornadoVM copied to clipboard
Several benchmarks fail when running on MacOS due to ambiguous functions
Describe the bug Running tornado-benchmarks.py gives failures with calls to ambiguous functions:
nbody : call to '__cl_sqrt' is ambiguous dgemm: call to '__cl_fma' is ambiguous dft: call to '__fast_relax_sin' is ambiguous call to '__fast_relax_cos' is ambiguous
There are a couple other failures (see output) , but I thought I would limit this bug report to the ambiguous function failures
How To Reproduce Run tornado-benchmarks.py
A clear and concise description of what you expected to happen. All benchmarks should run benchmark.txt
Computing system setup (please complete the following information):
- OS: MacOS Catalina 10.15.7 MacBook Pro 6-Core Intel I7, 16Gb memory, Radeo Pro 560X 4 GB
- OpenCL Version : 1.2
- TornadoVM commit id : e8c3161
Additional context
It seems that the OpenCL OSx compiler cannot distinguish the function being used. We might need to explicitly typecast for the different varieties of these intrinsics.
https://stackoverflow.com/questions/28851608/opencl-call-to-a-built-in-function-is-ambiguous
We will work on this.
Does it mean no trigonometric function can be used currently on MacOs? We develop on MacOS and run on Linux so it would be nice to have a workaround if one exists.
(Same behaviour on Monterey):
tornado --devices
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Number of Tornado drivers: 1
Driver: OpenCL
Total number of OpenCL devices : 3
Tornado device=0:0
OpenCL -- [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
Global Memory Size: 32.0 GB
Local Memory Size: 32.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: 1024
Max WorkGroup Configuration: [1024, 1, 1]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:1
OpenCL -- [Apple] -- Intel(R) UHD Graphics 630
Global Memory Size: 1.5 GB
Local Memory Size: 64.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: 256
Max WorkGroup Configuration: [256, 256, 256]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:2
OpenCL -- [Apple] -- AMD Radeon Pro 5500M Compute Engine
Global Memory Size: 8.0 GB
Local Memory Size: 64.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: 256
Max WorkGroup Configuration: [256, 256, 256]
Device OpenCL C version: OpenCL C 1.2
Does it mean no trigonometric function can be used currently on MacOs? We develop on MacOS and run on Linux so it would be nice to have a workaround if one exists.
Hi @yazun, if you run on Linux it should be ok to use the trigonometric functions.
We have faced some inconsistencies when we have deployed some benchmarks and unit-tests on MacOS. This is mainly because of the old version of the OpenCL drivers for this OS. For example, one of the latest issues that we encounter is the lack of support for native trigonometric functions. This could affect you only if you run tests on your development setup. Is this the case?
Yes, we see sin/cos work fine on Linux and all tests/benchmarks pass ok (see https://github.com/beehive-lab/TornadoVM/issues/145) with JDK17. Since we rely heavily on the trigonometric support it is an issue for development and indeed affects us.
Yes, we see sin/cos work fine on Linux and all tests/benchmarks pass ok (see #145) with JDK17. Since we rely heavily on the trigonometric support it is an issue for development and indeed affects us.
I see. Based on the information about your setup it seems that your driver supports OpenCL 1.2. This version offers support for the native trigonometric functions based on the standard (Page 252). So, it should not be a problem for you.
In my case, I have an older version of the driver and this is what causes the problem. Otherwise, the generated code from the TornadoVM JIT compiler should work as it respects the OpenCL standard.
Can you please confirm if sin
/cos
or native_sin
/native_cos
, they fail in your MacOS platform?
Good to hear it's not the limitation of the driver.
Not sure how to affect usage, tried both -Dtornado.enable.nativeFunctions=[False|True]
tornado -Dtornado.enable.nativeFunctions=False -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.vector, jdk.incubator.foreign
bm=dft-15-8192 , id=java-reference , average=3.876110e+09, median=3.855491e+09, firstIteration=4.149909e+09, best=3.756110e+09
bm=dft-15-8192 , device=0:0 , average=2.482360e+08, median=2.457139e+08, firstIteration=3.457622e+08, best=2.258864e+08, speedupAvg=15.6146, speedupMedian=15.6910, speedupFirstIteration=12.0022, CV=-0.0000%, deviceName= [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0x1024500 (Intel(R) UHD Graphics 630) (err:-2)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:42:16: error: call to '__fast_relax_sin' is ambiguous
d_22 = sin(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4765:22: note: expanded from macro 'sin'
#define sin(__x) __fast_relax_sin(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_sin, native_sin, __cl_sin);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:44:16: error: call to '__fast_relax_cos' is ambiguous
d_24 = cos(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4758:22: note: expanded from macro 'cos'
#define cos(__x) __fast_relax_cos(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_cos, native_cos, __cl_cos);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:46:16: error: call to '__cl_fma' is ambiguous
d_26 = fma(d_22, d_23, d_25);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
<program source>:49:16: error: call to '__cl_fma' is ambiguous
d_29 = fma(d_24, d_15, d_28);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_INVALID_VALUE] : OpenCL Error : clGetProgramBuildInfo failed: return buffer size (8192 bytes) was too small to hold the result: 11019 bytes
[TornadoVM-OCL-JNI] ERROR : clGetProgramBuildInfo -> Returned: -30
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[[email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
at [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
at [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
at [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Then with native enabled
tornado -Dtornado.enable.nativeFunctions=True -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.vector, jdk.incubator.foreign
bm=dft-15-8192 , id=java-reference , average=3.759722e+09, median=3.715212e+09, firstIteration=4.117996e+09, best=3.579663e+09
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz) (err:-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
kernel referenced external symbol '_Z10native_cosd' which could not be found.
kernel referenced external symbol '_Z10native_sind' which could not be found.
[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[[email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
at [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
at [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
at [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Specifying device does not change much, i.e. for discrete GPU (but worked for OpenCL on CPU )
tornado -Ds0.t0.device=0:2 -Dtornado.enable.nativeFunctions=False -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
bm=dft-15-8192 , id=java-reference , average=3.774364e+09, median=3.777630e+09, firstIteration=3.948355e+09, best=3.605918e+09
bm=dft-15-8192 , device=0:0 , average=2.213122e+08, median=2.150051e+08, firstIteration=2.796697e+08, best=2.063883e+08, speedupAvg=17.0545, speedupMedian=17.5700, speedupFirstIteration=14.1179, CV=-0.0000%, deviceName= [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0x1024500 (Intel(R) UHD Graphics 630) (err:-2)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:42:16: error: call to '__fast_relax_sin' is ambiguous
d_22 = sin(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4765:22: note: expanded from macro 'sin'
#define sin(__x) __fast_relax_sin(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_sin, native_sin, __cl_sin);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:44:16: error: call to '__fast_relax_cos' is ambiguous
d_24 = cos(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4758:22: note: expanded from macro 'cos'
#define cos(__x) __fast_relax_cos(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_cos, native_cos, __cl_cos);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:46:16: error: call to '__cl_fma' is ambiguous
d_26 = fma(d_22, d_23, d_25);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
<program source>:49:16: error: call to '__cl_fma' is ambiguous
d_29 = fma(d_24, d_15, d_28);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_INVALID_VALUE] : OpenCL Error : clGetProgramBuildInfo failed: return buffer size (8192 bytes) was too small to hold the result: 11019 bytes
[TornadoVM-OCL-JNI] ERROR : clGetProgramBuildInfo -> Returned: -30
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[[email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
at [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
at [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
at [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
so can ambiguity come from the fact there are two GPUs? Is it possible to target just one to avoid this problem?
Can you print the kernel please, with --printKernel
?
The error that you get ([CL_BUILD_ERROR] : OpenCL Build Error
) is because the OpenCL driver cannot compile the kernel. However, if the kernel works on Linux for a similar device type, this means that the problem most likely is an OS - driver issue.
In this case, my next step would be to try an OpenCL kernel that uses the same function calls from C++, and see if the program will be built by the OpenCL driver.
I do not believe that the problem is related to the number of GPUs. If you want to force execution on a particular device, you can do that by using this flag: -D<s>.<t>.device=<driverNumber>:<deviceNumber>
. See more here.
Could you clarify what should be -D<s>.<t>.device
for benchmarks for say device 0:1?
kernel included:
tornado --printKernel -Dtornado.enable.nativeFunctions=False -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
bm=dft-15-8192 , id=java-reference , average=3.694428e+09, median=3.697348e+09, firstIteration=3.965296e+09, best=3.548463e+09
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
_frame[0] = (ulong) _heap_base;
} // kernel
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
ulong ul_21, ul_40, ul_41, ul_0, ul_1, ul_2, ul_3, ul_19;
long l_17, l_18, l_16, l_37, l_38, l_39;
double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_20, d_22, d_12, d_14, d_13, d_32, d_31, d_34, d_33, d_35;
int i_9, i_8, i_7, i_6, i_5, i_4, i_36, i_15, i_11, i_10, i_42;
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
ul_0 = (ulong) _frame[3];
ul_1 = (ulong) _frame[4];
ul_2 = (ulong) _frame[5];
ul_3 = (ulong) _frame[6];
i_4 = get_global_size(0);
i_5 = i_4 + 8191;
i_6 = i_5 / i_4;
i_7 = get_global_id(0);
i_8 = i_6 * i_7;
i_9 = i_8 + i_6;
i_10 = min(i_9, 8192);
// BLOCK 1 MERGES [0 5 ]
i_11 = i_8;
for(;i_11 < i_10;)
{
// BLOCK 2
d_12 = (double) i_11;
// BLOCK 3 MERGES [2 4 ]
d_13 = 0.0;
d_14 = 0.0;
i_15 = 0;
for(;i_15 < 8192;)
{
// BLOCK 4
l_16 = (long) i_15;
l_17 = l_16 << 3;
l_18 = l_17 + 24L;
ul_19 = ul_0 + l_18;
d_20 = *((__global double *) ul_19);
ul_21 = ul_1 + l_18;
d_22 = *((__global double *) ul_21);
d_23 = (double) i_15;
d_24 = d_23 * 6.283185307179586;
d_25 = d_24 * d_12;
d_26 = d_25 / 8192.0;
d_27 = sin(d_26);
d_28 = -d_20;
d_29 = cos(d_26);
d_30 = d_29 * d_22;
d_31 = fma(d_27, d_28, d_30);
d_32 = d_14 + d_31;
d_33 = d_27 * d_22;
d_34 = fma(d_29, d_20, d_33);
d_35 = d_13 + d_34;
i_36 = i_15 + 1;
d_13 = d_35;
d_14 = d_32;
i_15 = i_36;
} // B4
// BLOCK 5
l_37 = (long) i_11;
l_38 = l_37 << 3;
l_39 = l_38 + 24L;
ul_40 = ul_2 + l_39;
*((__global double *) ul_40) = d_13;
ul_41 = ul_3 + l_39;
*((__global double *) ul_41) = d_14;
i_42 = i_11 + 1;
i_11 = i_42;
} // B5
// BLOCK 6
return;
} // kernel
bm=dft-15-8192 , device=0:0 , average=2.073738e+08, median=2.048341e+08, firstIteration=2.683069e+08, best=1.915486e+08, speedupAvg=17.8153, speedupMedian=18.0505, speedupFirstIteration=14.7790, CV=-0.0000%, deviceName= [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
_frame[0] = (ulong) _heap_base;
} // kernel
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
ulong ul_36, ul_14, ul_0, ul_16, ul_1, ul_2, ul_3, ul_35;
long l_13, l_11, l_12, l_33, l_34, l_32;
double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_15, d_18, d_17, d_20, d_19, d_22, d_21, d_8, d_7, d_9;
int i_6, i_5, i_37, i_4, i_31, i_10;
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
ul_0 = (ulong) _frame[3];
ul_1 = (ulong) _frame[4];
ul_2 = (ulong) _frame[5];
ul_3 = (ulong) _frame[6];
i_4 = get_global_size(0);
i_5 = get_global_id(0);
// BLOCK 1 MERGES [0 5 ]
i_6 = i_5;
for(;i_6 < 8192;)
{
// BLOCK 2
d_7 = (double) i_6;
// BLOCK 3 MERGES [2 4 ]
d_8 = 0.0;
d_9 = 0.0;
i_10 = 0;
for(;i_10 < 8192;)
{
// BLOCK 4
l_11 = (long) i_10;
l_12 = l_11 << 3;
l_13 = l_12 + 24L;
ul_14 = ul_0 + l_13;
d_15 = *((__global double *) ul_14);
ul_16 = ul_1 + l_13;
d_17 = *((__global double *) ul_16);
d_18 = (double) i_10;
d_19 = d_18 * 6.283185307179586;
d_20 = d_19 * d_7;
d_21 = d_20 / 8192.0;
d_22 = sin(d_21);
d_23 = -d_15;
d_24 = cos(d_21);
d_25 = d_24 * d_17;
d_26 = fma(d_22, d_23, d_25);
d_27 = d_9 + d_26;
d_28 = d_22 * d_17;
d_29 = fma(d_24, d_15, d_28);
d_30 = d_8 + d_29;
i_31 = i_10 + 1;
d_8 = d_30;
d_9 = d_27;
i_10 = i_31;
} // B4
// BLOCK 5
l_32 = (long) i_6;
l_33 = l_32 << 3;
l_34 = l_33 + 24L;
ul_35 = ul_2 + l_34;
*((__global double *) ul_35) = d_8;
ul_36 = ul_3 + l_34;
*((__global double *) ul_36) = d_9;
i_37 = i_4 + i_6;
i_6 = i_37;
} // B5
// BLOCK 6
return;
} // kernel
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0x1024500 (Intel(R) UHD Graphics 630) (err:-2)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:42:16: error: call to '__fast_relax_sin' is ambiguous
d_22 = sin(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4765:22: note: expanded from macro 'sin'
#define sin(__x) __fast_relax_sin(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_sin, native_sin, __cl_sin);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:44:16: error: call to '__fast_relax_cos' is ambiguous
d_24 = cos(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4758:22: note: expanded from macro 'cos'
#define cos(__x) __fast_relax_cos(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_cos, native_cos, __cl_cos);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:46:16: error: call to '__cl_fma' is ambiguous
d_26 = fma(d_22, d_23, d_25);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
<program source>:49:16: error: call to '__cl_fma' is ambiguous
d_29 = fma(d_24, d_15, d_28);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_INVALID_VALUE] : OpenCL Error : clGetProgramBuildInfo failed: return buffer size (8192 bytes) was too small to hold the result: 11019 bytes
[TornadoVM-OCL-JNI] ERROR : clGetProgramBuildInfo -> Returned: -30
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[[email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
at [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
at [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
at [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
and the native kernel:
tornado --printKernel -Dtornado.enable.nativeFunctions=True -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
bm=dft-15-8192 , id=java-reference , average=3.819124e+09, median=3.791125e+09, firstIteration=4.083596e+09, best=3.519395e+09
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
_frame[0] = (ulong) _heap_base;
} // kernel
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
ulong ul_21, ul_40, ul_41, ul_0, ul_1, ul_2, ul_3, ul_19;
long l_17, l_18, l_16, l_37, l_38, l_39;
double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_20, d_22, d_12, d_14, d_13, d_32, d_31, d_34, d_33, d_35;
int i_9, i_8, i_7, i_6, i_5, i_4, i_36, i_15, i_11, i_10, i_42;
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
ul_0 = (ulong) _frame[3];
ul_1 = (ulong) _frame[4];
ul_2 = (ulong) _frame[5];
ul_3 = (ulong) _frame[6];
i_4 = get_global_size(0);
i_5 = i_4 + 8191;
i_6 = i_5 / i_4;
i_7 = get_global_id(0);
i_8 = i_6 * i_7;
i_9 = i_8 + i_6;
i_10 = min(i_9, 8192);
// BLOCK 1 MERGES [0 5 ]
i_11 = i_8;
for(;i_11 < i_10;)
{
// BLOCK 2
d_12 = (double) i_11;
// BLOCK 3 MERGES [2 4 ]
d_13 = 0.0;
d_14 = 0.0;
i_15 = 0;
for(;i_15 < 8192;)
{
// BLOCK 4
l_16 = (long) i_15;
l_17 = l_16 << 3;
l_18 = l_17 + 24L;
ul_19 = ul_0 + l_18;
d_20 = *((__global double *) ul_19);
ul_21 = ul_1 + l_18;
d_22 = *((__global double *) ul_21);
d_23 = (double) i_15;
d_24 = d_23 * 6.283185307179586;
d_25 = d_24 * d_12;
d_26 = d_25 / 8192.0;
d_27 = native_sin(d_26);
d_28 = -d_20;
d_29 = native_cos(d_26);
d_30 = d_29 * d_22;
d_31 = fma(d_27, d_28, d_30);
d_32 = d_14 + d_31;
d_33 = d_27 * d_22;
d_34 = fma(d_29, d_20, d_33);
d_35 = d_13 + d_34;
i_36 = i_15 + 1;
d_13 = d_35;
d_14 = d_32;
i_15 = i_36;
} // B4
// BLOCK 5
l_37 = (long) i_11;
l_38 = l_37 << 3;
l_39 = l_38 + 24L;
ul_40 = ul_2 + l_39;
*((__global double *) ul_40) = d_13;
ul_41 = ul_3 + l_39;
*((__global double *) ul_41) = d_14;
i_42 = i_11 + 1;
i_11 = i_42;
} // B5
// BLOCK 6
return;
} // kernel
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz) (err:-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
kernel referenced external symbol '_Z10native_cosd' which could not be found.
kernel referenced external symbol '_Z10native_sind' which could not be found.
[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[[email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), [email protected]/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
at [email protected]/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
at [email protected]/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
at [email protected]/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
at [email protected]/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at [email protected]/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
I remembered OpenCL has been a dead-end for Macs (i.e. https://www.reddit.com/r/OpenCL/comments/qevdbg/opencl_30_on_macos/)
Something to keep in mind during the next round of purchases for any scientific team.
Nevertheless,seems we can work on openCL with -Dtornado.enable.nativeFunctions=False
with CPU so it should be good enough for development for now.
I remembered OpenCL has been a dead-end for Macs (i.e. https://www.reddit.com/r/OpenCL/comments/qevdbg/opencl_30_on_macos/)
Yes, these are unfortunate news.