LLaMA_MPS
LLaMA_MPS copied to clipboard
Support Apple Neural Engine (ANE) Transformers
I noticed Apple supports ANE Transformers.
According to their own words:
M1 or newer chip to achieve up to 10 times faster and 14 times lower peak memory
Does that mean running 30B or 65B will be possible on small-memory MacBooks?
Here are a few links https://github.com/apple/ml-ane-transformers https://machinelearning.apple.com/research/neural-engine-transformers
As this project is the top LLaMA that leverages Apple GPU, is it possible to support ANE too?
I don't know whether that would provide much speedup for current LLM architectures, which are memory bound. Rather, it might be useful for Stable Diffusion (compute-bound) or MegaByte transformers.