Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

PaliGemma 4bit Quantization broken and Inference issues.

Usually the best way to debug these sorts of issues is to have a reference implementation (like the HF one) and the MLX implementation side-by-side. Then write some wrapper to...

PaliGemma 4bit Quantization broken and Inference issues.

> when I changed to approx="precise" or implemented the transformers FastGELUActivation the difference came down to ~3 What's that number mean? ~3 would be a large value for the max-abs-diff...

PaliGemma 4bit Quantization broken and Inference issues.

I'm not quite following the gelu story. But I think the safest call is to find the GELU implementation of the reference implementation (presumably the Jax code) and use that....

PaliGemma 4bit Quantization broken and Inference issues.

Any luck getting to the bottom of this? FWIW it's expected there are some numerical differences in the MLX / PyTorch versions. Rather than looking at the sum (which is...

PaliGemma 4bit Quantization broken and Inference issues.

What are the formulas for these? ``` Relative Distance (using norms): 2.5038335e-05 Max Absolute Relative Difference: 1.6712433 ``` > Yesterday, I tried using the huggingface VLM class in my implementation...

PaliGemma 4bit Quantization broken and Inference issues.

I couldn't say what the issue is.. I'll try to take a deeper look in the next few days.

[Feature Request] Function Calling for mlx_lm.server

It would be pretty cool to add this and perhaps not too difficult. I believe function calling requires a few things: - A model which supports the function calling prompt...

Phi-3-mini-4k-instruct : Failing to stop at <|end|> on generating the answer.

Could you share the model you are using (e.g. `phi3_path`)?

Phi-3-mini-4k-instruct : Failing to stop at <|end|> on generating the answer.

I updated the [MLX Community Phi-3](https://huggingface.co/collections/mlx-community/phi-3-66280e1b1635e4d2d55f7c22) models to use the correct eos token. Also you can always specify the eos token in the `load` function like so: ``` load("model_name", tokenizer_config={"eos_token":...

Add support for ibm granite

@Blaizzy this diff includes some previous PRs. Can you rebase on origin's main and force push to your branch?