Hüseyin Tuğrul BÜYÜKIŞIK issues

Results 26 issues of


                                            Hüseyin Tuğrul BÜYÜKIŞIK

Image decode+resize+multiple_encode pipeline

Such that it will consume 1 image at each step(push as data to pipeline) and all stages(decode resize encode) will run concurrently, opportunistically on multiple GPUs.

Epic

feature

add built-in matrix multiplication with sizes between 2x2 and 8192x8192

batched 2x2 4x4 16x16 32x32 single 8k x 8k with sub-matrix partitioning to increase load balancing N-levels of partitioning (4,16,64,256 sub matrices) or M-levels of batching

enhancement

Add device limits stress testing to have numbers used later in production or alarming when approaching limits.

OpenCL can't get max number of command-queues. Add a test that creates command queues up to 1024, until it gives out of memory or out of resources error, save the...

enhancement

Add speed-ratio indicator between devices after 10-20 iterations

Gets average of last 10-20 or all iterations, getting compute time versus buffer copy/access times for an efficiency percentage too.

enhancement

Disposing unused buffers with warning message

api is creating a new buffer for each unique array given as parameter, with enough arrays, it could give out of resources. * LRU cache to hold max=N buffers(regardless of...

enhancement

Force multiple-of-64 for array size when using streaming and C++ arrays (cl_mem_use_host_ptr)

I don't know if Intel,Amd or Nvidia fixes this error inside and fallsback to cl_mem_alloc version.

question

No offline compiler

Adding a `clCreateProgramFromBinary()` might be useful for FPGA owners. FPGAs may take hours to compile a single kernel while a gaming GPU can do it in seconds.

enhancement

ClArray<T> CopyTo CopyFrom for larger and smaller arrays.

For now, it can copy only same sized arrays. Being able to copying differently sized arrays could help in some cases.

enhancement

Sequential kernel executions in same `compute()` method

`array.compute(cruncher, 1, "kernel1 kernel2 kernel3", globalSize, localSize)` here all kernels listed in parameter are run with same globalSize and localSize. globalsize and localSize should support multiple values. Overloading compute with...

enhancement

Assets link is broken.

Where can I get assets to try the application?