Wei Tian

Results 5 comments of Wei Tian

@inducer my application spends most of the time on device (it is called computational fluid dynamics, in case you wonder 😄 ). I only counted the time the kernel is...

@inducer my bad. I think I confused `kernel time` and `execution time`. So, according to your experience, is the kernel time (time cost of set args, enqueue, and execution) of...

> The Python bits (setting and preparing arguments) is slower, but this time can (and should be) hidden by kernel execution, which occurs asynchronously on the device. Make sense!

@inducer consider the following code ```python queue = cl.Queue(...) t1 = time.time() kernels[1].set_args(...) # enqueue event = cl.enqueue_nd_range_kernel(queue, kernels[1],...) event.wait() t2 = time.time() total_time= t2-t1 execution_time= (event.profile.end-event.profile.start)*1e-9 ``` I searched...

It turned out I was confused by `cl.enqueue_copy` and `_cl._enqueue_copy_buffer `. The input argument of the first uses order of `dst_buffer, src_buffer` while that of the second uses `src_buffer, dst_buffer`...