openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[GPU] Swiglu OPT kernel implementation

Open dnkurek opened this issue 1 year ago • 5 comments

Details:

  • Implement Swiglu OPT kernel
  • Use fast erf function instead of slow OpenCL erf

Tickets:

dnkurek avatar Aug 22 '24 04:08 dnkurek

Is there a reason to add a new opt kernel without applying fast_erf to the ref kernel?

e-ddykim avatar Aug 22 '24 04:08 e-ddykim

Is there a reason to add a new opt kernel without applying fast_erf to the ref kernel?

Well, while fast_erf is the most important part of this performance-wise, this opt kernel has a simpler layout and later on it can be modified to support block reads (which I did and it did give more performance boost aswell, but it's more work which can be done later, so it's not included in this PR)

So it's just convenience in my opinion, plus it's slightly faster.

dnkurek avatar Aug 22 '24 04:08 dnkurek

Is there a reason to add a new opt kernel without applying fast_erf to the ref kernel?

Plus, also a faster implementation for tanh will be needed in the future, but it's out of scope of this PR

dnkurek avatar Aug 22 '24 04:08 dnkurek

Is there a reason to add a new opt kernel without applying fast_erf to the ref kernel?

Also, since this kernel has grown to support more than just swiglu, wouldn't you think it would make sense to rename this operation to something like "gated activation" in a later PR? I think the name now is a bit misleading.

dnkurek avatar Aug 22 '24 04:08 dnkurek

Is there a reason to add a new opt kernel without applying fast_erf to the ref kernel?

Also, since this kernel has grown to support more than just swiglu, wouldn't you think it would make sense to rename this operation to something like "gated activation" in a later PR? I think the name now is a bit misleading.

Yes, I agree with you. "gated activation" or "GLU activation" would be better than the current name.

e-ddykim avatar Aug 22 '24 04:08 e-ddykim