llama.cpp
llama.cpp copied to clipboard
metal : compile-time kernel args and params
I was just thinking about this idea, so writing it down for future research.
We should be able to fairly easy generate model-specific Metal code that has hardcoded kernels for every single node in the computation graph. The idea is to make an initial pass of a certain graph where we record all kernel calls with their respective argument values and parameters and then generate a model-specific MSL source file with all these kernels instances - either copy-paste or via templates. I guess this is something similar to what people call JIT. Wondering what kind of speed-up we will be able to see with this strategy.
This issue was closed because it has been inactive for 14 days since being marked as stale.
This issue was closed because it has been inactive for 14 days since being marked as stale.
This issue was closed because it has been inactive for 14 days since being marked as stale.
This issue was closed because it has been inactive for 14 days since being marked as stale.