candle icon indicating copy to clipboard operation
candle copied to clipboard

WIP: Precompile metal kernels into `.metallib` files

Open zackangelo opened this issue 1 year ago • 2 comments

I'm running into an issue where the first time I call apply_repeat_penalty, it takes a very long time (in excess of 6 seconds). It seems to be coming from the Tensor::to_vec1d call to move the logits into a Vec<f32>. It seems like a simple copy like this would be very fast.

It was suggested on Discord that this slowness might be due to some other async stuff happening in Metal, maybe the compilation of the kernels on first load.

This PR precompiles the kernels at build time instead of on every run. Unfortunately, it doesn't seem to solve my problem but it might be useful for other reasons.

zackangelo avatar Jul 15 '24 18:07 zackangelo

Looking at #2322, will likely need to reconfigure the build script to optionally produce iOS .metallibs

zackangelo avatar Jul 15 '24 19:07 zackangelo

@LaurentMazare is this something you think you would want to potentially merge? if so I can clean it up. Otherwise, we can close it.

zackangelo avatar Jul 20 '24 15:07 zackangelo

Going to close this due to inactivity, if it's something we feel like we need in the future this should serve as a good starting point.

zackangelo avatar Nov 19 '24 17:11 zackangelo