MIVisionX
MIVisionX copied to clipboard
OpenVX Framework - Kernel execution on CPU
Hi,
I was going through the code to understand a bit the implementation and how kernels get executed in parallel on CPU in case the graph has nodes that can be executed in parallel. Am I wrong or all nodes/kernels of a graph get executed in serial fashion on single core? At least this is what I understand when looking at agoExecuteGraph() function. Maybe for OpenCL the situation is different.
@bogdanul2003 the nodes in the graph execute serially on a single core. The nodes themselves can use the available cores to execute parallel computation. OpenCL nodes occupy the required number of CUs when launched on a GPU.
@bogdanul2003 just to add on what @kiritigowda, multiple sub_graphs can be created to run them in different cores. OpenCL always uses parallel threads on GPU
One thing to mention, at the moment my workload is CPU based only. If I understand correctly, I need to compile the framework with OpenCL support so that I can get nodes executed in parallel on different CPU cores ? My question, I saw that I forgot to mention this, was more related to the case when you compile without opencl support. @rrawther is it possible to run sub_graphs on different coreas without opencl? I couldn't figure out who decides which sub_graphs can be executed on different cores.
@bogdanul2003 : Currently OpenCL implementation is only targeted for GPU only. We don't have an OpenCL implementation which gets executed in parallel on different CPU cores. Are you running on Windows or Linux? We have multithreading support for windows assuming you have separate graphs created for nodes which has to run in parallel. Because of data dependency most OpenVX graphs are executed sequentially in our current implementaion
Thanks @rrawther . I thought that it can figure out which nodes can be executed in parallel depending on how you build your graph. Do you plan to add this feature also to the framework for CPU only workloads? Do you know if other implementations of OpenVX (Nvidia or Intel) offer this kind of optimization ?
@bogdanul2003 Once the nodes are submitted to GPU, they can run in parallel provided no data dependancy. OpenVX framework checks if the input data is ready before a node is executed. We don't have much insight into Intel or NVidia. But We will be adding enhancements to our implementation in future.
@rrawther thanks for the clarification. Can we keep this ticket open until this feature is added ?