GPTQ-triton icon indicating copy to clipboard operation
GPTQ-triton copied to clipboard

rotary embedding and layer norm

Open qwopqwop200 opened this issue 1 year ago • 1 comments

I've added two enhancements to the current GPTQ for LLaMA. This brings speed up. 1.triton rotary embedding implemented by aljungberg https://github.com/qwopqwop200/GPTQ-for-LLaMa/pull/221 Implement rotary embedding with triton. This gives a huge speed-up. 2.triton RMS norm https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/quant/triton_norm.py The RMS norm is implemented as a triton. You get a slight extra speed boost.

qwopqwop200 avatar May 08 '23 14:05 qwopqwop200