ivarflakstad

Results 12 comments of ivarflakstad

Fair enough 😊 Maybe you know who does?

I have [this](https://github.com/huggingface/candle/tree/metal-mfa-bfloat) branch with working bfloat matmul. I'm testing running falcon on it now (downloading) It is based on work I've done [here](https://github.com/ivarflakstad/metal-flash-attention/tree/temp-bfloat-work-stash) which is not ready to be...

If you have enough RAM you should be able to run Falcon on the candle branch I mentioned above. Here I am running Mamba (130m) with bf16:

I have a M1 pro 32gb. Metal: 7.30 token/s vs accelerate: 2.28 token/s. Is it still slow for you?

I'm on the main brain. That's why I'm asking if it is still slow for you :)

There is experimental Metal support. Not using MPS right now - might add it in the future as a fallback for compatability reasons at some point. So yes there is...

Memory could be the issue, but then I would expect your computer to be showing signs of that as you are running the model. Is it? For comparison, could you...

@bayedieng I recently refurbished the buffer allocator for metal, which is now merged in main - would you mind checking if it has improved the issue? :)

Ok thanks. Could you try using `cargo-instruments -t Allocations` and share what it looks like? :)

Hmm I see, that's a valid point. Still I want to emphasize that it is the exact same "bounds" as were there originally - except it was expressed through if...