llm.c cuda code that approaches cublas performance

cuda code that approaches cublas performance

Open nyck33 opened this issue 1 year ago • 0 comments

https://colab.research.google.com/drive/1RNFSPtD0o9aJFwnqKQSRabODtSZjwPN1 by https://makslevental.github.io/ based on https://siboehm.com/articles/22/CUDA-MMM seems quite fast and then I'm also looking at this: https://thunder.snu.ac.kr/?page_id=64&page=6 I'm just fishing for opinions and am planning to try to emulate that blog/website and try to implement matmul_forward for this repo to start. Or if anyone else wants to use these for reference, please go ahead.

Apr 26 '24 01:04 nyck33

llm.c llm.c copied to clipboard

cuda code that approaches cublas performance

llm.c
llm.c copied to clipboard