fastGPT
fastGPT copied to clipboard
Draft: inline gelu
This gets to 0.594s, but it's not as readable as before, so I am going to keep it as a Draft for now, since the ideas are good, but ultimately this should be done by the compiler.
I don't see any speed difference with caching enabled (both main and this PR at 0.288s). With caching disabled, this PR is 0.525s, main is 0.716s.