Hüseyin Tuğrul BÜYÜKIŞIK
Hüseyin Tuğrul BÜYÜKIŞIK
Such that it will consume 1 image at each step(push as data to pipeline) and all stages(decode resize encode) will run concurrently, opportunistically on multiple GPUs.
batched 2x2 4x4 16x16 32x32 single 8k x 8k with sub-matrix partitioning to increase load balancing N-levels of partitioning (4,16,64,256 sub matrices) or M-levels of batching
OpenCL can't get max number of command-queues. Add a test that creates command queues up to 1024, until it gives out of memory or out of resources error, save the...
Gets average of last 10-20 or all iterations, getting compute time versus buffer copy/access times for an efficiency percentage too.
api is creating a new buffer for each unique array given as parameter, with enough arrays, it could give out of resources. * LRU cache to hold max=N buffers(regardless of...
I don't know if Intel,Amd or Nvidia fixes this error inside and fallsback to cl_mem_alloc version.
Adding a `clCreateProgramFromBinary()` might be useful for FPGA owners. FPGAs may take hours to compile a single kernel while a gaming GPU can do it in seconds.
For now, it can copy only same sized arrays. Being able to copying differently sized arrays could help in some cases.
`array.compute(cruncher, 1, "kernel1 kernel2 kernel3", globalSize, localSize)` here all kernels listed in parameter are run with same globalSize and localSize. globalsize and localSize should support multiple values. Overloading compute with...
Where can I get assets to try the application?