FelixCLC
FelixCLC
Hi all, I'm in the process of researching Apple AMX as a potential way of speeding up IEEE FP BLAS kernels in OpenBLAS. On the MacOS side, it seems that...
It may be possible to increase performance by using the io_uring asynchronous api within the program for the input read. example of basic setup here: https://twitter.com/axboe/status/1576671920488972288/photo/1 man page here: https://man.archlinux.org/man/io_uring.7
Hi Travis, Came across your post on AVX-512 (https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html) and after skimming through, seemed like your methodology and documentation was strong. Looking at other posts, this seems to hold true...
### Environment | Hardware | description | |----------|---------------| | GPU | -gfx1010-rx5600xt | | CPU | -12700k-AVX512 pcores only | | Software | version | |----------|---------| | OS | -...
## System information model name : 12th Gen Intel(R) Core(TM) i7-12700K 00:02.0 Display controller [0380]: Intel Corporation AlderLake-S GT1 [8086:4680] (rev 0c) 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices,...
Initial idea of using POCL as a cuda translation layer isnt viable because of POCL not working with image formats on cuda. Currently reaching out to Yasroslav Pogrebnyak, the developer...
One of the requirements to pull this whole thing together will be the ability to parse arguments requested from the host and change to hardware accelerated versions. for example, if...
Have to find out if I'm better off using UT vs KP Kubernetes is borderline standard for ditributed compute these days. would make scalling out to more nodes much easier...
Any interest in commands for SDRHDR tonemapping? There's versions for vulkan, oneapi, cuda, OpenCL, and CPU