Simone Margaritelli
Simone Margaritelli
My branch compiles and runs on iOS:  I'm still getting weird errors when I try to have my macOS M1 doing computations with an...
Fixes happening here: https://github.com/evilsocket/candle Integration happening here: https://github.com/evilsocket/llama3-cake
I have a feeling it comes from this https://codebrowser.dev/tokio/crates/rand-0.8.5/src/distributions/weighted_index.rs.html#454 .. i'm trying to narrow it down but to be honest I have very little experience with Candle so it's taking...
tensor printf debugging ftw
@LaurentMazare you were 100% right, the error is in the logits output vector being full of NaN, the sampling doesn't like that ... disabling the kv_cache on both workers and...
@LaurentMazare it seems to depend on some (version?) discrepancy between the libMetalFlashAttention.metallib bundled in this repo vs the iOS and macOS libMetalFlashAttention.metallib bundled here https://github.com/philipturner/metal-flash-attention/releases/tag/v1.0.1 In this issue https://github.com/huggingface/candle/issues/1759 @ivarflakstad...
@ivarflakstad understood, i'll try to compile this for iOS https://github.com/FL33TW00D/metal-flash-attention and wait for proper fix, thanks
@ivarflakstad just tried to compile @FL33TW00D fork (last commit), that is not the one: ``` found name = Apple M1 Max using name = Apple M1 Max thread 'main' panicked...
I've literally tried to compile all branches of that repo and all of them fail the tests
@ivarflakstad i became a little bit obsessed with this so I tried to compile and test every single commit of every branch of both forks of libMetalFlashAttention.metallib, both for macOS...