Awni Hannun
Awni Hannun
That looks like a bug 🤔
The `bfloat` thing is a different issue. I will send a fix for the add shortly. Could put the bfloat problem in a separate issue as it might be harder...
Could you please share the error message and a way to reproduce it?
Not yet. We are working on it. Probably makes sense to follow up on this once we have some basic support there.
We don't have an immediate plan for this. I'm sure there is potential to make CPU convolutions faster. I'm curious though, why not use the GPU instead?
I don't think we've benchmarked the convs in BNNS but it would be interesting to see how they perform. There would have to be a copy in and out as...
Benchmarks: No degradation in token generation: ``` python -m mlx_lm.generate --model mlx-community/NeuralBeagle14-7B-4bit-mlx --prompt "Write a story about Albert Einstein" --temp 0.0 --max-tokens 256 ``` ``` Pre: Prompt: 219.423 tokens-per-sec Generation:...
@jagrit06 @angeloskath I think this is ready for review.
For a review, the main thing to look at is: - Updated way the compiled includes are made: `metal/CMakeLists.txt`, `metal/jit/includes.h`, and `metal/make_compiled_preamble.sh` - The way primitives get or build kernels:...
I wonder why it's so slow / if there is a way to asynchronously free resources? Indeed I've noticed in the past that the program can take a while to...