clpeak icon indicating copy to clipboard operation
clpeak copied to clipboard

[src] use CL_PROFILING_COMMAND_END as latency time

Open alohali opened this issue 4 years ago • 4 comments

CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED is real kernel latency

alohali avatar Apr 16 '20 02:04 alohali

Is it more accurate to test kernel latency with CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED and run a extreme small kernel? see >20us difference on several ARM MALI GPU device.

alohali avatar Apr 16 '20 02:04 alohali

Thanks. I agree with the small kernel part. I am seeing more latency for cpu platforms like pocl. How can 'CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED' give better accuracy wrt CL_PROFILING_COMMAND_START?

krrishnarraj avatar Apr 16 '20 13:04 krrishnarraj

Thanks. I agree with the small kernel part. I am seeing more latency for cpu platforms like pocl. How can 'CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED' give better accuracy wrt CL_PROFILING_COMMAND_START?

Because kernel launch latency contains pre-launch, post-launch latency and other execution latency. CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_QUEUED only calculates pre launch parts but not post launch parts. CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED includes both pre and post. The real kernel execution time is almost zero.

alohali avatar Apr 23 '20 01:04 alohali

From https://stackoverflow.com/questions/39924433/opencl-events-ambiguity it seems to me that CL_PROFILING_COMMAND_SUBMIT - CL_PROFILING_COMMAND_START is the pre-execution latency. CL_PROFILING_COMMAND_COMPLETE was added in OpenCL 2.0. I'm guessing CL_PROFILING_COMMAND_COMPLETE - CL_PROFILING_COMMAND_END is the post-execution latency.

There may also a lower bound on CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_START which might be another form of latency.

So CL_PROFILING_COMMAND_COMPLETE - CL_PROFILING_COMMAND_SUBMIT on very small kernel may be a way to measure the latency.

nchristensen avatar Oct 07 '22 18:10 nchristensen