candle
candle copied to clipboard
Metal iOS
Great framework!
Is the usage of Metal already possible on iOS? I'm trying to run the Phi example on iOS and I can only get it to work with a CPU device but not with Metal. MTLStorageModeManaged isn't available on iOS.
I've never tried compiling for iOS but this issue seems related #1759
Have had a try at it here but something based on candle-examples/examples/phi is generating mostly blank tokens on my iPad. Not sure what the issue is at this point. TBC
Thanks a lot @LaurentMazare for the great work.
I tried running it on iOS and as the OP noted, it works well but only on CPU. When using the GPU, the Metal code crashes because candle is explicitly using MTLResourceOptions::StorageModeManaged in a few places. The managed mode is not available on iOS and tvOS so the code panics.
I tried simply changing it to be shared
everywhere, but that also panics because iOS has a buffer size limit of 256MB (see: https://github.com/gfx-rs/metal-rs/blob/master/src/device.rs#L712-L718) and at least trying to run Phi would attempt to allocate more.
Can we somewhere control the buffer size that is being used by candle?
ref https://github.com/huggingface/candle/issues/2322 ... @filipw i've stumbled upon the same issue, did you eventually manage to make it work on iOS?
I'll report here my comment on #2322 because I think it's super relevant and might be a very good lead to fix the issue:
After a bit of digging into Apple MLX and especially how they handle buffer allocation on both macOS and iOS, I found this https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/allocator.cpp#L207
You will see that all allocations are centralized there and they always and only use ResourceStorageModeShared (you won't find references to other storage modes in their metal backend). So it seems like on iOS at least, managed buffers are not needed. It makes sense if we think about it as Metal on macOS must support both Intel (where GPU and RAM are not unified) and Apple Silicon, while Metal on iOS only cares about unified memory, hence no syncing needed / supported.
Looking forward to your thoughts.