Woosuk Kwon
Woosuk Kwon
cc @njhill Could you please take a look?
This must be fixed by #9350.
Hi @JackChuang, thanks for the PR. I think it's a cool idea, and we do want to reduce the CUDA kernels in our repo. However, our solution to this is...
Hi @esmeetu, thanks for the PR! Could you elaborate more on the benefit of this change? I think we can keep the current version for better compatibility with HF, if...
> I found that nn.GELU(approximate="tanh") is identical to NewGELU() Could you provide a reference to this?
@mgoin thanks for your review. can you please take another look?
@robertgshaw2-redhat Can you please take a look?
> Going to summarize comments into a single PR @robertgshaw2-neuralmagic Is there any update?
@mgoin +1. Let's update this. @abmfy Sorry for the late review. Could you please add an accuracy test for the new kernel?
@robertgshaw2-neuralmagic @tlrmchlsmth could you please take a look?