llamafile Feature Request: Apple Silicone Neural Engine - Core ML model package format support

Description

Please consider adding Core ML model package format support to utilize Apple Silicone Nural Engine + GPU.

Success Criteria Utilize both ANE & GPU, not just GPU on Apple Silicon

Additional Context

List of Core ML package format models https://github.com/likedan/Awesome-CoreML-Models

Work in progress on CoreML implementation for [whisper.cpp]. They see x3 performance improvements for some models. (https://github.com/ggerganov/whisper.cpp/discussions/548) you might be interested in.

You might also be interested in another implementation Swift Transformers. Example of CoreML application https://github.com/huggingface/swift-chat

Core ML is a framework that can redistribute workload across CPU, GPU & Nural Engine (ANE). ANE is available on all modern Apple Devices: iPhones & Macs (A14 or newer and M1 or newer). Ideally, we want to run LLMs on ANE only as it has optimizations for running ML tasks compared to GPU. Apple claims "deploying your Transformer models on Apple devices with an A14 or newer and M1 or newer chip to achieve up to 10 times faster and 14 times lower peak memory consumption compared to baseline implementations".

To utilize Core ML first, you need to convert a model from TensorFlow, PyTorch to Core ML model package format using coremltools (or simply utilize existing models in Core ML package format ).
Second, you must now use that converted package with an implementation designed for Apple Devices. Here is the Apple XCode reference PyTorch implementation.

https://machinelearning.apple.com/research/neural-engine-transformers

Apr 29 '24 01:04 qdrddr

https://appleinsider.com/articles/24/05/07/secret-apple-project-acdc-to-pioneer-ai-chips-for-data-centers

Under the internal name "Project ACDC," Apple is developing Apple Silicon designed specifically for server farms dedicated to AI processing. The company aims to optimize AI applications within its data centers for future versions of its platforms.

May 22 '24 01:05 qdrddr

I second this. Support for Swift and ANE will be helpful for iOS and Mac developers

Aug 19 '24 16:08 johnyquest7

llama.cpp supports so many platforms so judging by https://github.com/ggerganov/llama.cpp/discussions/336 it sounds like even they don't support Apple's Neural Engine yet, which not knowing much about it myself, seems like a red flag. There's a lot of skepticism in that thread about the value it has to offer transformer models, compared to the M2 CPU and Metal GPU.

Based on what I've read here, it sounds like Apple asks too much. The goal of llamafile is to be able to package LLMs into a single file executable that runs on many platforms. If Apple Neural Engine requires that we encode the weights in a proprietary format that only works for their neural engine, then we'd need to ship llamafiles that target only their platform specifically.

I get the impression Neural Engine is a great technology for helping Apple App Store developers build the highest quality apps possible. As much as I'd love to be a part in helping to enable that, it's hard for us to lend support to a platform when it pulls us away from serving the other platforms we're committed to supporting too.

Aug 19 '24 19:08 jart

llamafile llamafile copied to clipboard

Feature Request: Apple Silicone Neural Engine - Core ML model package format support

llamafile
llamafile copied to clipboard