unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Kernel tuning and benchmarking

Open cm2435 opened this issue 1 year ago • 3 comments

Hey! Just opening an issue because there doesn't seem to be a discussion board.

I noticed there's no tuning around all of the triton kernels for things like block size and not much coverage around if the kernels are really faster than torch native opsets.

Is this a conscious decision? Kernel tuning takes time, in the order of adding perhaps a min to the model fitting, but equally for training job that takes more than an hour a tuned kernel that is even $1/60 = 1.6$% faster would pay for itself which is a quite low bar.

I'd happily go cherry-pick my kernel testing from the PHI-2 implementation branch and write some simple benchmarks with them so we can test the impacts of kernel tuning if there's interest?

cm2435 avatar Jan 26 '24 09:01 cm2435

@cm2435 Oh fair point on auto tuning on block sizes - I found 1024 approx to be reasonably OK on Tesla T4 and A100s. I think I tuned some myself by hand, so technically I did do some tuning, just not auto-tuning :) There's actually an auto tuner in Triton which allows you to auto select the fastest options. I do agree you can squeeze even more out :)

danielhanchen avatar Jan 26 '24 11:01 danielhanchen

@danielhanchen Yeah that was what I was going to PR; the thing is that the triton autotuner has a little overhead to it because it tries a big combinatorial list of block and warp sizes and then picks the fastest one for your specific matrix shape. So I was wondering if it was worth the tradeoff or at least measuring.

cm2435 avatar Jan 26 '24 14:01 cm2435

Ye agreed - in fact the overhead is kinda annoying LOLL - I remember it was 2-5ms. The issue is one has to benchmark across T4, A100s and other GPUs. Another better approach is before the kernel runs, we "patch" the Triton auto-dispatcher to only call the only best one - this can be done but it'll require some work on the auto patching side of things

danielhanchen avatar Jan 26 '24 18:01 danielhanchen