exo [BOUNTY - $1000] Compile tinygrad to swift

I want to keep exo 100% python if possible
Would like to compile swift or objc inference code in tinygrad
The deliverable here is a merged PR in tinygrad and a small demonstration in exo of how this would be used to run a tinygrad model on iOS. Since running Python on iOS is a difficult task in itself (see #352), it would be sufficient to show two things:

Swift or objc code generated from tinygrad running on iOS
A new InferenceEngine implementation in exo (Python) that can trigger the execution of models by executing the generated swift or objc code. This can be demonstrated on a Mac

Sep 27 '24 15:09 AlexCheema

hey @AlexCheema , I am interested in this . can I take up this bounty ? I can start working on this right away .

Sep 28 '24 10:09 sambhavnoobcoder

hey @AlexCheema , I am interested in this . can I take up this bounty ? I can start working on this right away .

yes, please do! I've assigned you now

Sep 28 '24 15:09 AlexCheema

feel free to join the discord @sambhavnoobcoder there's a bounties channel

Sep 28 '24 15:09 AlexCheema

fyi this is pretty much entirely a tinygrad project https://github.com/tinygrad/tinygrad it would be incredibly valuable to exo to have this though, which is why we are offering the $1,000 bounty for it. the deliverable here is a merged PR in tinygrad and a small demonstration in exo of how this would be used to run a tinygrad model on iOS.

Sep 30 '24 17:09 AlexCheema

got it . i have written some code , will raise a draft pr in 1-2 days for reviews and will build it up from there .

Oct 01 '24 08:10 sambhavnoobcoder

hey @AlexCheema , i have written some changes and also tested it using a unit testing file that i wrote . i just wanted to enquire , what is the expected method of testing for this PR ? What would you expect from a successful test and demo for a successful PR that solves this issue ? Screenshot 2024-10-06 at 8 15 14 PM

Oct 06 '24 14:10 sambhavnoobcoder

hey @AlexCheema , i have written some changes and also tested it using a unit testing file that i wrote . i just wanted to enquire , what is the expected method of testing for this PR ? What would you expect from a successful test and demo for a successful PR that solves this issue ?

What swift code are you generating currently? What I want to see is an end-to-end example of how this would be integrated into exo such that we can run performant inference code on iOS with metal GPU acceleration

Oct 06 '24 17:10 AlexCheema

I assume you want it to use METAL on iOS? If CPU is okay, compile_efficientnet example is pretty much already this.

Yea, I see above you want GPU. Shouldn't be too hard, you can run the model in a TinyJit and access the kernels. Then you have the src for the kernels and the order, probably 50 lines of boilerplate to run them.

Oct 08 '24 14:10 geohot

thank you @geohot for the suggestion . using your advice , i was able to compile tinygrad to swift and save , pass it down the pipeline . however @AlexCheema i would like to enquire , when you said you needed an end to end demonstration , could you clearify what did you require ? do you need a video of an IOS application running this interface ? or if there is some other way to demonstrate this working ? Currently i get the generated swift code and this information :

if you could clear this up , i could start prepping the demo accordingly and once you approve of that , i'll go through with the procedures of PR etc .

Oct 09 '24 14:10 sambhavnoobcoder

thank you @geohot for the suggestion . using your advice , i was able to compile tinygrad to swift and save , pass it down the pipeline . however @AlexCheema i would like to enquire , when you said you needed an end to end demonstration , could you clearify what did you require ? do you need a video of an IOS application running this interface ? or if there is some other way to demonstrate this working ? Currently i get the generated swift code and this information :
if you could clear this up , i could start prepping the demo accordingly and once you approve of that , i'll go through with the procedures of PR etc .

So I'm actually not sure what the best thing is here.

The thing I'm hoping to avoid is a separate iOS implementation of exo: I want one codebase that's close to 100% Python.

One frankenstein setup I was thinking about was

generating some swift code that can run inference on Metal GPU as @geohot described (ideally generate an InferenceEngine implementation)
bringing a Python interpreter into an iOS app (e.g. using Pyto) to run the main exo node software
that python code should be able to call the generated swift code for inference (with Pyto this would be through a "rubicon")

this seems convoluted so maybe there's a simpler way to avoid maintaining a separate iOS implementation and keeping one Python implementation).

Oct 09 '24 16:10 AlexCheema

https://github.com/user-attachments/assets/b4dc07ad-6d77-40f7-be2b-a834f02334e9

@AlexCheema I've developed a demo that demonstrates the compilation of TinyGrad models to Swift for Metal acceleration. The process involves three main steps:

We run compile_to_swift.py, which compiles a TinyGrad model (in this case, EfficientNet) to Swift. This generates EfficientNetMetalInferenceEngine.swift, an inference engine for Metal computations in Swift.
We build the generated Swift file using Swift command-line tools to verify its robustness and integrity.
We use the compiled Swift file in metal_inference_demo.py, which demonstrates the integration of the TinyGrad-compiled Swift code with Exo. This script utilizes the Swift inference engine we generated, leveraging Metal as its accelerator to make predictions equivalent to those of EfficientNet. Successful initialization, along with non-zero, non-NaN, and sensible outputs that align with typical EfficientNet predictions, would confirm that: a) The pipeline functions correctly b) The outputs are accurate This demonstration shows that we can run a TinyGrad model on iOS using Metal acceleration. I plan to enhance this approach by making it more dynamic with TinyJit, which will allow for more flexible and comprehensive kernel generation. Once I've implemented these changes, I'll raise a PR in tinygrad . Does this align with what you were expecting for the demo? I wanted to confirm that this approach meets your requirements before proceeding with the TinyJit enhancements and submitting the PR.

Oct 10 '24 22:10 sambhavnoobcoder

Screen.Recording.2024-10-11.at.3.1.mp4

@AlexCheema I've developed a demo that demonstrates the compilation of TinyGrad models to Swift for Metal acceleration. The process involves three main steps:

We run compile_to_swift.py, which compiles a TinyGrad model (in this case, EfficientNet) to Swift. This generates EfficientNetMetalInferenceEngine.swift, an inference engine for Metal computations in Swift.

We build the generated Swift file using Swift command-line tools to verify its robustness and integrity.

We use the compiled Swift file in metal_inference_demo.py, which demonstrates the integration of the TinyGrad-compiled Swift code with Exo. This script utilizes the Swift inference engine we generated, leveraging Metal as its accelerator to make predictions equivalent to those of EfficientNet. Successful initialization, along with non-zero, non-NaN, and sensible outputs that align with typical EfficientNet predictions, would confirm that: a) The pipeline functions correctly b) The outputs are accurate This demonstration shows that we can run a TinyGrad model on iOS using Metal acceleration. I plan to enhance this approach by making it more dynamic with TinyJit, which will allow for more flexible and comprehensive kernel generation. Once I've implemented these changes, I'll raise a PR in tinygrad . Does this align with what you were expecting for the demo? I wanted to confirm that this approach meets your requirements before proceeding with the TinyJit enhancements and submitting the PR.

Does this run on the GPU?
How does the exo integration work?
Have you looked at my previous comment about running the exo Python code on iOS using something like Pyto? I'd like to see a demonstration of this working on exo e2e on iOS.

Oct 14 '24 22:10 AlexCheema

The current implementation runs on a Mac M1 Air, which features an integrated GPU. Given that Metal support is activated, we can reasonably assume that it's utilizing the GPU component.
While this implementation is now largely independent of Exo, I plan to make minor modifications to the inference engine. I'll add it to the existing workflows as an optional MetalInferenceEngine. This will be similar to how models.py currently offers multiple inference engines like MLX and tinygrad for LLaMA and others. The Metal-based engine will be available as another option.
Initially, I thought the previous demonstration would be sufficient, as it executed the Swift code flawlessly. However, I'm open to exploring other alternatives. I'm not familiar with Pyto, so any resources or references regarding its usage would be greatly appreciated. Also, could you clarify what you mean by an "end-to-end demo"? Are you referring to running Exo on an iOS device like an iPhone? If so, I may face some hardware limitations on my end.

Oct 15 '24 22:10 sambhavnoobcoder

WIP but I've been able to generate swift code by modifying ops_metal in tinygrad, this is tinygrad's gpt2 example in swift which uses these kernels. The output tokens are the same as tinygrad's (~until it crashes after 6 tokens, not sure why yet I think because of memory usage~ only for one token until numpy() works). Is this the kind of thing you would want outputted? Also I can't get JIT working with it yet, which would make the code much shorter. Is swift a requirement? I think this would be easier to do in objc.

Oct 17 '24 20:10 roryclear

WIP but I've been able to generate swift code by modifying ops_metal in tinygrad, this is tinygrad's gpt2 example in swift which uses these kernels. The output tokens are the same as tinygrad's (until it crashes after 6 tokens, not sure why yet). Is this the kind of thing you would want outputted? Also I can't get JIT working with it yet, which would make the code much shorter. Is swift a requirement? I think this would be easier to do in objc.

Does this run on the GPU?

objc would also work.

Oct 18 '24 18:10 AlexCheema

Yeah it's running the same metal kernels as tinygrad, so should be all GPU, I have checked for usage too though. I just wanted to make sure of your requirements.

Oct 18 '24 21:10 roryclear

https://github.com/roryclear/tinygrad/tree/ios I need to clean this up a lot. As far as inference I've only tested this with gpt2 atm, If you've anything useful written in tinygrad that will fit in my iPhone 13 I can hopefully show it running.

Running tinygrad on that branch with IOS=1 should add objc and metal code to the iOS project in the branch. I haven't got numpy() working yet, so once data is copied out of metal to python, the script stops.

Oct 21 '24 19:10 roryclear

I updated the initial instructions to be clearer. Since a few are working on this, the merged PR will receive the full 1000 USD bounty. The other submissions (provided they are complete) will receive 500 USD each.

Oct 23 '24 07:10 AlexCheema

I updated the initial instructions to be clearer. Since a few are working on this, the merged PR will receive the full 1000 USD bounty. The other submissions (provided they are complete) will receive 500 USD each.

Sounds good, I was never bullish about getting iOS merged into tinygrad, I'm not sure they want it, but wanted to try this anyway.

https://github.com/user-attachments/assets/9d51539b-1011-4616-a31c-f4c0ef5befed

My fork should be somewhat readable now. This is tinygrad's gpt2 implementation running on iOS. It is only doing the first token as there's a small bit of non tinygrad code between each token that's not captured. But if a node is just taking in data, doing inference on the GPU, then sending the results elsewhere (I may be wrong about this, I haven't used exo yet), I think this should have enough functionality.

edit: I'm going to try also get this working with grpc and have iOS behave like any other device in tinygrad, rather than compiling everything before running.

Oct 24 '24 14:10 roryclear

@sambhavnoobcoder you don't have an iPhone and try to develop this?

Nov 11 '24 12:11 darkBuddha

sadly no , i only have been testing on my mac so far . My PR is also currently built and tested solely on my mac only .

Nov 12 '24 06:11 sambhavnoobcoder

@sambhavnoobcoder how would you even test your PR then

Nov 12 '24 06:11 darkBuddha

well , so far i've been trying to emulate an iphone on my mac for testing and validating the swift inference engine . However it has been very hard to debug the errors on it due to the same . That's partially why the progress on the PR has been slow .

Nov 12 '24 10:11 sambhavnoobcoder

hey @sambhavnoobcoder, I would like to test this with you to help you complete it. I have a device we can use if you want to jump on a screen share at some point.

Nov 14 '24 22:11 cadenmackenzie

hey @cadenmackenzie , thanks for the help , it is much appreciated . i'd love to collaborate on this with you . i'll reach out to you on discord , and we can hopefully pick it up from there .

Nov 16 '24 16:11 sambhavnoobcoder

iOS running on tinygrad for 1: https://github.com/roryclear/tinygrad/pull/1. No grpc, just requests library in python.

Nov 18 '24 13:11 roryclear