Awni Hannun

Results 1014 comments of Awni Hannun

Quantized matmuls are WIP for the CUDA back-end. It's probably the top priority, hopefully they will be in an upcoming release 🤞 You should be able to run benchmarks for...

This is a pretty major undertaking and it's unlikely we will have bandwidth to work on it in the near future. We'd need a Vulkan runtime back-end and presumably we'd...

So there are a couple things you should change in general about your Llama implementation: 1. Use `nn.RMSNorm` instead of rolling your own 2. Use `nn.RoPE` instead of rolling your...

It's running on my M1 Max (32GB) with this command: ``` python -m mlx_vlm.lora --model-path Llama-3.2-11B-Vision-Instruct-4bit --dataset 5CD-AI/Viet-ShareGPT-4o-Text-VQA --split Viet_OCR_VQA --steps 100 --learning-rate 5e-6 --lora-rank 16 --lora-alpha 16 ``` and...

Could you try upgrading to the latest MLX (0.18.1) (and if it's used here MLX LM (0.19.1)) just to be sure we didn't fix something.. (I think this PR may...

> Thank you! Do you have any tips specific to MLX? First verify that data loading is in fact the issue. I would do that by using the same batch...

It's not exactly on the roadmap.. but we would be happy to accept it as a contribution.

First, just a heads up, shapeless compilation/export can fail and it often won't tell you. So it's recommended to [use it carefullly](https://ml-explore.github.io/mlx/build/html/usage/compile.html#shapeless-compile): > Use shapeless compilations carefully. Since compilation is...

It should be quite doable to add a complex GEMM for the CPU backend using BLAS (zgemm or something) As for the Metal backend it will probably be more work...