llm.c
llm.c copied to clipboard
Rewrite the encoder_forward float4 kernel with pack128
No notice of change in performance after the changing from float4 to pack128.