pyopencl icon indicating copy to clipboard operation
pyopencl copied to clipboard

errors with complex dtype on some Macs

Open SyamGadde opened this issue 7 years ago • 3 comments

Some kernels using complex types seem to fail on some Macs. Using PyOpenCL pulled from github today, I see the following failures from test_clmath.py on an iMac 17,1 (ca. 2015) with an AMD Radeon R9 M380:

planck:~ sg9$ env PYOPENCL_NO_CACHE=1 PYTHONPATH=~/checkedout/pyopencl/build/lib.macosx-10.11-x86_64-2.7:~/checkedout/pyopencl/.eggs/cffi-1.9.1-py2.7-macosx-10.11-x86_64.egg ipython ~/checkedout/pyopencl/test/test_clmath.py 
================================== test session starts ==================================
platform darwin -- Python 2.7.11, pytest-3.0.5, py-1.4.32, pluggy-0.4.0
rootdir: /Users/sg9/checkedout/pyopencl, inifile: 
collected 54 items `

checkedout/pyopencl/test/test_clmath.py .F.................F.F........................s.sFs.sF

The failures are in test_exp, test_log, test_tanh, test_complex_bessel, test_hankel_01_complex.

All the tests succeed on another iMac 14,2 (ca. 2013) with an nVidia GeForce GT 755M.

I could send the full output of the tests, but the following code fails on the AMD Radeon R9 M380:

In [2]: import numpy; import pyopencl; import pyopencl.array; ctx = pyopencl.create_some_context(); queue = pyopencl.CommandQueue(ctx); a_g = pyopencl.array.to_device(queue, numpy.array([2], dtype=numpy.complex64)); b_g = pyopencl.array.to_device(queue, numpy.array([2], dtype=numpy.complex64)); c_g = a_g + b_g; print "%s + %s = %s" % (a_g.get(), b_g.get(), c_g.get(),)
Choose platform:
[0] <pyopencl.Platform 'Apple' at 0x7fff0000>
Choice [0]:0
Choose device(s):
[0] <pyopencl.Device 'Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz' on 'Apple' at 0xffffffff>
[1] <pyopencl.Device 'AMD Radeon R9 M380 Compute Engine' on 'Apple' at 0x1021c00>
Choice, comma-separated [0]:1
Set the environment variable PYOPENCL_CTX='0:1' to avoid being asked again.
[ 2.+0.j] + [ 2.+0.j] = [ 0.+0.j]

Obviously, the answer to 2 + 2 should be 4. a_g and b_g seem to be transferring to the device correctly, but the kernel computing the addition does not seem to do the right thing. In my own kernels, when I send a complex-valued array as an argument, the kernel does not seem to see the correct values (I print them with printf's and usually get 0 or something ridiculous like 0 + 0.XXXXe-38). So it seems like the kernels are not getting the correct array as inputs or are interpreting them incorrectly.

Curious to see if anyone else is experiencing this! Thanks!

SyamGadde avatar Jan 10 '17 18:01 SyamGadde

Have you tried CPU device? In my case (macpro with Radeon R9 M370X) GPU also fails complex types, and it becomes clearer with CPU as it returns: NotImplementedError: No work-around to Apple's broken structs-as-kernel arg handling has been found. Cannot pass complex numbers to kernels. seems to be an issue of the default apple amd driver..

hightower8083 avatar Jan 10 '17 23:01 hightower8083

Aha, I never tried CPU because the workgroup limits on CPU devices on Apple made it useless for me. Thank you! That means this may be the same issue as:

https://lists.tiker.net/pipermail/pyopencl/2016-February/002114.html

because replacing the new complex struct/union in pyopencl-complex.h with the old float2 and .x and .y accessors (or float4 and .xy and .zw) fixes things for me (except for cfloat_log() which is just over threshold).

I wonder if it might be possible to test building the struct/union version when creating a context and if it fails, then it would know to send a #define on subsequent compiles for that device that pyopen-complex.h should use the old/unsafe float2/float4 method?

For now I will just use a custom pyopen-complex.h but if you think the above solution has merit I would be happy to attempt it.

-syam

Sent from my Android phone using Symantec TouchDown (www.symantec.com)

-----Original Message----- From: hightower8083 [[email protected]] Received: Tuesday, 10 Jan 2017, 6:26PM To: pyopencl/pyopencl [[email protected]] CC: Syam Gadde [[email protected]]; Author [[email protected]] Subject: Re: [pyopencl/pyopencl] errors with complex dtype on some Macs (#163)

Have you tried CPU device? In my case (macpro with Radeon R9 M370X) GPU also fails complex types, and it becomes clearer with CPU as it returns: NotImplementedError: No work-around to Apple's broken structs-as-kernel arg handling has been found. Cannot pass complex numbers to kernels. seems to be an issue of the default apple amd driver..

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pyopencl_pyopencl_issues_163-23issuecomment-2D271730983&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=IRSzQmowrwuNuuFtrNClyPNxsYToJPzk5ychHDCUnhc&m=GxJ8Vf0cA108kka4CLDE5GWfInkXtG7ddrJ-Um9hzIg&s=6MBU_50_6TMKHnwExrN386Sc9IfmoUnRto1Duu8TqIw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABMI5Bzm6XhObFiAfUfEWG-2D1IfR7cjvxks5rRBOZgaJpZM4Lfwu1&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=IRSzQmowrwuNuuFtrNClyPNxsYToJPzk5ychHDCUnhc&m=GxJ8Vf0cA108kka4CLDE5GWfInkXtG7ddrJ-Um9hzIg&s=mogdvgcWDs004bzshygpff2HjMFUZqtlM8HBL-HZhoU&e=.

SyamGadde avatar Jan 11 '17 02:01 SyamGadde

Well, in an ideal world, Apple would just fix their compiler. That said, they're not the only CL implementation struggling with that. If you can make it so the header swaps itself while keeping all code using it identical, I might be willing to consider a patch.

inducer avatar Jan 11 '17 07:01 inducer