llm.c
llm.c copied to clipboard
Why CUDA when we can SYCL
I know that CUDA is considered 'vanilla' but why not give SYCL a shot? We can finally have things work on all GPUs OOB. I benchmarked SYCL and it's pretty good: https://chsasank.com/portblas-portable-blas-across-gpus.html
My favorite implementation is adaptiveCPP while DPCPP is pretty good too.
Do you plan to use BLAS libraries?
We can finally right the wrong and truly be opensource :).
I found this repo which contains the porting of the kernels under dev/cuda
to sycl.
Wow this is amazing!
I'm working on a SYCL port too. It runs on SYCL CPU device, Intel GPUs, and NVIDIA A100. My handwritten kernels aren't that great yet, but using gemm
and gemm_batch
from the open-source oneMKL Interfaces, which have wrappers over Intel's oneMKL and NVIDIA's cuBLAS libraries among other things, dramatically brings down some of the timings. It has been so much fun trying different things to bring the timings down! Timings for B=8
, T=1024
case are a little over 1s per step.
Thanks @karpathy for this wonderful repository and for your Youtube video series!