candle WIP: Precompile metal kernels into `.metallib` files

WIP: Precompile metal kernels into `.metallib` files

Open zackangelo opened this issue 1 year ago • 2 comments

I'm running into an issue where the first time I call apply_repeat_penalty, it takes a very long time (in excess of 6 seconds). It seems to be coming from the Tensor::to_vec1d call to move the logits into a Vec<f32>. It seems like a simple copy like this would be very fast.

It was suggested on Discord that this slowness might be due to some other async stuff happening in Metal, maybe the compilation of the kernels on first load.

This PR precompiles the kernels at build time instead of on every run. Unfortunately, it doesn't seem to solve my problem but it might be useful for other reasons.

Jul 15 '24 18:07 zackangelo

Looking at #2322, will likely need to reconfigure the build script to optionally produce iOS .metallibs

Jul 15 '24 19:07 zackangelo

@LaurentMazare is this something you think you would want to potentially merge? if so I can clean it up. Otherwise, we can close it.

Jul 20 '24 15:07 zackangelo

Going to close this due to inactivity, if it's something we feel like we need in the future this should serve as a good starting point.

Nov 19 '24 17:11 zackangelo

candle candle copied to clipboard

WIP: Precompile metal kernels into `.metallib` files

candle
candle copied to clipboard