Prince Canuma comments

Results 151 comments of


                                            Prince Canuma

PaliGemma 4bit Quantization broken and Inference issues.

I event copied the transformers GELU activation in numpy to compare but I get similar to the`precise` approximation in MLX.

PaliGemma 4bit Quantization broken and Inference issues.

I did exactly that. Here are the implementations I tried. All of which are identical to the one used in transformer and JAX: ```python class FastGELUActivation(nn.Module): """ Applies GELU approximation...

PaliGemma 4bit Quantization broken and Inference issues.

Yet, sum of abs-diff is close around 2.39 and 3.77 on the vision path. And the model still refuses a lot. From the start till the first MLP everything is...

PaliGemma 4bit Quantization broken and Inference issues.

I'm not sure what I am I missing here. Let me go for a walk 🚶🏾‍♂️...

PaliGemma 4bit Quantization broken and Inference issues.

Not yet, Yesterday, I tried using the huggingface VLM class in my implementation but that didn't change the results. Let me check the relative distance and let you know.

PaliGemma 4bit Quantization broken and Inference issues.

@awni here are the results: Language Model (Embedding output) ``` Relative Distance (using norms): 0.0 Max Absolute Relative Difference: 0.0 Are Matrices Close (np.allclose): True ``` Vision Model (Patch_embedding output):...

PaliGemma 4bit Quantization broken and Inference issues.

> What are the formulas for these? ```python def relative_diff(x1, x2): assert x1.shape == x2.shape, "Matrices must have the same dimensions" if x1.ndim > 2 or x2.ndim > 2: x1...

PaliGemma 4bit Quantization broken and Inference issues.

Ok, after some deeper debugging. I think the issue is in the multimodal feature merging and/or masking. I'll update you once I have it working.

PaliGemma 4bit Quantization broken and Inference issues.

@awni @lucasb-eyer I did everything by the book but the model still doesn't behave propely. It seems like it behaves better only when using multimodal features from the transformers model....

PaliGemma 4bit Quantization broken and Inference issues.

@awni this weird behaviour also happened with `Idefics2` in the past. The only thing these have in common is that they are using F32 precision.