James Price
James Price
We probably just need to increase the tolerance. The error will also be proportional to the size of the arrays (unlike with the other kernels), so we need to make...
This adds the core of a new WebGPU backend. There are still many holes in the functionality and rough edges around how this interacts with Emscripten, native WebGPU implementations, and...
The GPU code generated for the below test produces incorrect results on the OpenCL and Metal backends (and the WIP WebGPU backend), and may well on others too. The test...
Approach ---------- A single timer is a static storage duration object with a name/description, accumulates timings over the course of the application run, and prints out the total and average...
Several `clGetDeviceInfo` queries such as [`CL_DEVICE_GLOBAL_MEM_CACHE_SIZE `](https://github.com/kpet/clvk/blob/master/src/api.cpp#L447) and [`CL_DEVICE_MAX_COMPUTE_UNITS`](https://github.com/kpet/clvk/blob/master/src/api.cpp#L477) currently have placeholder values such as `0` or `1`, since there is no way to query these device properties from Vulkan...
The approach of using a global class instance to destruct the Vulkan instance/devices does not work when the OpenCL application also uses global destructors for cleaning up OpenCL objects. For...
When using UBOs/SSBOs for POD arguments, each kernel enqueue performs a new memory allocation, which is freed when the kernel instance completes. We don't currently track how many kernel instances...
I'm hitting some build issues on a platform whereby some of the symbols that clvk defines in the global namespace are clashing with those defined elsewhere (e.g. `string`). I'd recommend...
Compiling this with `-cl-std=CL2.0 -inline-entry-points`: ``` const int data = 42; kernel void foo() {} ``` produces SPIR-V that fails to validate: ``` error: line 22: StorageBuffer OpVariable '11[%11]' has...
Reduced from OpenCL CTS `vload_half` test: ``` kernel void test( const global half *p, global float2 *f ) { local ushort data[2] __attribute__((aligned(sizeof(uint)))); local half* hdata_p = (local half*) data;...