llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

Little speed up by simple modification is possible

Open kurtulmehtap opened this issue 10 months ago • 1 comments

  In file train_gpt2.py;

You can replace the line return 0.5 * input * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (input + 0.044715 * torch.pow(input, 3.0)))) with return 0.5 * input * (1.0 + torch.(0.66285246118) * (input + 0.044715 * torch.pow(input, 3.0))))

as tanh(math.sqrt(2.0 / math.pi) is approximately equal to 0.66285246118.

More instances can be found if the code is scanned carefully. This line alone can replace a divide+square root+trigonometric instructions (many many cycles in x64 and ARM) with a single constant.

kurtulmehtap avatar Apr 14 '24 16:04 kurtulmehtap

Line in question here. Any idea on speed up/reduction in number of operations?

GerardWalsh avatar Apr 15 '24 11:04 GerardWalsh