llama.cpp metal : compile-time kernel args and params

metal : compile-time kernel args and params

Open ggerganov opened this issue 2 years ago • 4 comments

I was just thinking about this idea, so writing it down for future research.

We should be able to fairly easy generate model-specific Metal code that has hardcoded kernels for every single node in the computation graph. The idea is to make an initial pass of a certain graph where we record all kernel calls with their respective argument values and parameters and then generate a model-specific MSL source file with all these kernels instances - either copy-paste or via templates. I guess this is something similar to what people call JIT. Wondering what kind of speed-up we will be able to see with this strategy.

Nov 15 '23 11:11 ggerganov

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 02 '24 01:04 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

May 19 '24 01:05 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jul 04 '24 01:07 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

Aug 18 '24 01:08 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

metal : compile-time kernel args and params

llama.cpp
llama.cpp copied to clipboard