pyopencl Image not initialized on GPU

It seems sometimes image is not correctly initialized on NVIDA GPU on windows.

import numpy as np
import pyopencl as cl
from pyopencl import cltypes

platform = cl.get_platforms()[0]
print(platform, platform.version)

device = platform.get_devices()[0]
print(
    device,
    "image support: {}".format(device.get_info(cl.device_info.IMAGE_SUPPORT)),
)

ctx = cl.Context([device])
print(ctx)

cmd_queue = cl.CommandQueue(ctx)

image_data = np.arange(0.0, 10.0)

image_device = cl.Image(
    ctx,
    cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,
    cl.ImageFormat(cl.channel_order.RG, cl.channel_type.FLOAT),
    hostbuf=np.stack((image_data, image_data), axis=1).astype(cltypes.float),
)

output_array = np.empty((10,), dtype=cltypes.float)
output_device = cl.Buffer(
    ctx, cl.mem_flags.READ_WRITE, 10 * np.dtype(cltypes.float).itemsize
)

kernel_string = """
__kernel void test(__read_only image1d_t img, __global float * output){
    sampler_t img_sampler = CLK_NORMALIZED_COORDS_FALSE |
                            CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR;
    int i = get_global_id(0);
    float x = 0.5f + i;
    output[i] = read_imagef(img, img_sampler, x).x;
}
"""
program = cl.Program(ctx, kernel_string).build(
    "-cl-single-precision-constant -I."
)
program.test(cmd_queue, (10,), None, image_device, output_device)
cl.enqueue_copy(cmd_queue, output_array, output_device)
print(output_array)

Output:

<pyopencl.Platform 'NVIDIA CUDA' at 0x2da1b7114c0> OpenCL 1.2 CUDA 10.2.120
<pyopencl.Device 'GeForce GTX 1080 Ti' on 'NVIDIA CUDA' at 0x2da1b7118d0> image support: 1
<pyopencl.Context at 0x2da18593670 on <pyopencl.Device 'GeForce GTX 1080 Ti' on 'NVIDIA CUDA' at 0x2da1b7118d0>>
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

However, if I choose Intel platform and CPU as device, the output is correct:

<pyopencl.Platform 'Intel(R) OpenCL' at 0x24f397a87d0> OpenCL 2.1
<pyopencl.Device 'Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz' on 'Intel(R) OpenCL' at 0x24f397e22e0> image support: 1
<pyopencl.Context at 0x24f39fadbc0 on <pyopencl.Device 'Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz' on 'Intel(R) OpenCL' at 0x24f397e22e0>>
C:\Users\nanqin\Miniconda3\envs\opencl\lib\site-packages\pyopencl\__init__.py:235: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  "to see more.", CompilerWarning)
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

OS: windows10

I tested with AMD GPU on Mac and the output is correct

<pyopencl.Platform 'Apple' at 0x7fad2a020890> OpenCL 1.2 (Feb 22 2019 20:16:07)
<pyopencl.Device 'AMD Radeon Pro 555X Compute Engine' on 'Apple' at 0x7fad2a1031b0> image support: 1
<pyopencl.Context at 0x7fad2a1016a0 on <pyopencl.Device 'AMD Radeon Pro 555X Compute Engine' on 'Apple' at 0x7fad2a1031b0>>
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

May 29 '19 21:05 Nan2018

Platform and device info:

================
  Platform # 1
================

Platform name         	:	NVIDIA CUDA
OpenCL version        	:	OpenCL 1.2 CUDA 10.2.120
Platform vendor       	:	NVIDIA Corporation
OpenCL profile        	:	FULL_PROFILE
Extensions            	:
                      	:	cl_khr_global_int32_base_atomics
                      	:	cl_khr_global_int32_extended_atomics
                      	:	cl_khr_local_int32_base_atomics
                      	:	cl_khr_local_int32_extended_atomics
                      	:	cl_khr_fp64
                      	:	cl_khr_byte_addressable_store
                      	:	cl_khr_icd
                      	:	cl_khr_gl_sharing
                      	:	cl_nv_compiler_options
                      	:	cl_nv_device_attribute_query
                      	:	cl_nv_pragma_unroll
                      	:	cl_nv_d3d10_sharing
                      	:	cl_khr_d3d10_sharing
                      	:	cl_nv_d3d11_sharing
                      	:	cl_nv_copy_opts
                      	:	cl_nv_create_buffer
                      	:	
Device(s)             	:	1

----------------
  Device # 1
----------------

Device name                                        	:	GeForce GTX 1080 Ti
OpenCL device type                                 	:	GPU
Vendor name                                        	:	NVIDIA Corporation
OpenCL version                                     	:	OpenCL 1.2 CUDA
Device vendor identifier                           	:	4318
OpenCL software driver version                     	:	430.86


Maximum number of samplers                         	:	32
Maximum number of work-items in a work-group       	:	1024
Maximum dimensions that specify work-item IDs      	:	3
Maximum number of work-items in each dimension     	:	1024, 1024, 64
Address space size                                 	:	32


Type of local memory                               	:	Local memory storage
Size of local memory arena (in bytes)              	:	49152
Type of global memory cache                        	:	Read-Write cache
Size of global memory cache (in bytes)             	:	458752
Size of global memory cache line (in bytes)        	:	128
Size of global device memory (in bytes)            	:	11811160064


Device is available                                	:	Yes
Compiler is available                              	:	Yes
Little endian device                               	:	Yes
Error correction support                           	:	No
Images are supported                               	:	Yes


Max width of 2D image (in pixels)                  	:	16384
Max height of 2D image (in pixels)                 	:	32768
Max width of 3D image (in pixels)                  	:	16384
Max height of 3D image (in pixels)                 	:	16384
Max depth of 3D image (in pixels)                  	:	16384


Resolution of device timer (in nanoseconds)        	:	1000
Maximum configured clock frequency (in MHz)        	:	1582
The number of parallel compute cores               	:	28
Max number of __constant arguments in a kernel     	:	9
Max size of a constant buffer allocation (in bytes)	:	65536
Max size of memory object allocation (in bytes)    	:	2952790016
Max size of kernel arguments (in bytes)            	:	4352
Max number of simultaneously read image objects    	:	256
Max number of simultaneously written image objects 	:	16
Alignment of the base address (in bits)            	:	4096
Minimum alignment for any data type (in bytes)     	:	128


Preferred native vector width size for char type   	:	1
Preferred native vector width size for short type  	:	1
Preferred native vector width size for int type    	:	1
Preferred native vector width size for long type   	:	1
Preferred native vector width size for float type  	:	1
Preferred native vector width size for double type 	:	1


Single precision floating-point capability         	:
                                                   	:	denorms are supported
                                                   	:	INF and NaNs are supported
                                                   	:	round to nearest even rounding mode supported
                                                   	:	round to zero rounding mode supported
                                                   	:	round to +ve and -ve infinity rounding modes supported
                                                   	:	IEEE754-2008 fused multiply-add is supported
Double precision fp capability                     	:
                                                   	:	denorms are supported
                                                   	:	INF and NaNs are supported
                                                   	:	round to nearest even rounding mode supported
                                                   	:	round to zero rounding mode supported
                                                   	:	round to +ve and -ve infinity rounding modes supported
                                                   	:	IEEE754-2008 fused multiply-add is supported
Half precision fp capability                       	:
                                                   	:	round to nearest even rounding mode supported
                                                   	:	round to zero rounding mode supported
Execution capabilities                             	:
                                                   	:	The OpenCL device can execute OpenCL kernels
Supported command-queue properties                 	:	Commands are executed out-of-order;The profiling of commands is enabled
Extensions                                         	:
                                                   	:	cl_khr_global_int32_base_atomics
                                                   	:	cl_khr_global_int32_extended_atomics
                                                   	:	cl_khr_local_int32_base_atomics
                                                   	:	cl_khr_local_int32_extended_atomics
                                                   	:	cl_khr_fp64
                                                   	:	cl_khr_byte_addressable_store
                                                   	:	cl_khr_icd
                                                   	:	cl_khr_gl_sharing
                                                   	:	cl_nv_compiler_options
                                                   	:	cl_nv_device_attribute_query
                                                   	:	cl_nv_pragma_unroll
                                                   	:	cl_nv_d3d10_sharing
                                                   	:	cl_khr_d3d10_sharing
                                                   	:	cl_nv_d3d11_sharing
                                                   	:	cl_nv_copy_opts
                                                   	:	cl_nv_create_buffer

May 29 '19 21:05 Nan2018

According to NVIDA CUDA download page, CUDA 10 is the latest version.

May 29 '19 21:05 Nan2018

~~Hello, my GPU is integrated graphics, intel HD 515 My CPU is Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz~~

~~I am using clinfo, the output　ｉｓ　~~

Platform Name Intel(R) OpenCL Number of devices 2 Device Name Intel(R) Gen9 HD Graphics NEO Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 2.1 NEO Driver Version 18.28.11080 Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Profile FULL_PROFILE Max compute units 24 Max clock frequency 850MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256

~~But when I print the device with pyopencl, I get the following result <pyopencl.Device 'Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz' on 'Intel(R) CPU Runtime for OpenCL(TM) Applications' at 0x1a4b0c8> image support: 1~~

only CPU

I want to know how to select GPU in pyopencl thanks

Sep 01 '19 15:09 subshall