llm.c Why CUDA when we can SYCL

Why CUDA when we can SYCL

Open chsasank opened this issue 10 months ago • 3 comments

I know that CUDA is considered 'vanilla' but why not give SYCL a shot? We can finally have things work on all GPUs OOB. I benchmarked SYCL and it's pretty good: https://chsasank.com/portblas-portable-blas-across-gpus.html

My favorite implementation is adaptiveCPP while DPCPP is pretty good too.

Do you plan to use BLAS libraries?

We can finally right the wrong and truly be opensource :).

Apr 09 '24 15:04 chsasank

I found this repo which contains the porting of the kernels under dev/cuda to sycl.

Jun 03 '24 13:06 chin-jey

Wow this is amazing!

Jun 03 '24 14:06 chsasank

I'm working on a SYCL port too. It runs on SYCL CPU device, Intel GPUs, and NVIDIA A100. My handwritten kernels aren't that great yet, but using gemm and gemm_batch from the open-source oneMKL Interfaces, which have wrappers over Intel's oneMKL and NVIDIA's cuBLAS libraries among other things, dramatically brings down some of the timings. It has been so much fun trying different things to bring the timings down! Timings for B=8, T=1024 case are a little over 1s per step.

Thanks @karpathy for this wonderful repository and for your Youtube video series!

Aug 14 '24 02:08 gajanan-choudhary

llm.c llm.c copied to clipboard

Why CUDA when we can SYCL

llm.c
llm.c copied to clipboard