candle
candle copied to clipboard
Unable to build candle with flash attention on iOS
When I try to build and run a llama 3.2 1b model on iOS (iPhone 14) with flash attention on Metal, I get ``/Users/jpchen/.cargo/git/checkouts/candle-6740f55d69a3bf41/b4ec636/candle-transformers/src/models/llama.rs:254:5: not implemented: compile with '--features flash-attn'`
A little unfamiliar with Candle - I see that flash attention is supported for Metal hardware, and I was curious if this is an ios specific thing or if theres a way I could build it to get flash attention support? Thanks.