mlx-vlm Add Support for OS-Copilot/OS-Atlas-Base-7B

@Blaizzy Can you help me run OS-Copilot/OS-Atlas-Base-7B, i tried converting to mlx, 8bit but unable to get the same accuracy as the original model here https://huggingface.co/spaces/maxiw/OS-ATLAS What could i be doing wrong? used command mlx_vlm.convert --hf-path OS-Copilot/OS-Atlas-Base-7B -q --q-bits 8

Mar 06 '25 20:03 prncvrm

more details here https://github.com/OS-Copilot/OS-Atlas/issues/51

Mar 07 '25 14:03 prncvrm

Hey,

I just tried it.

It works well on demo samples but fails with custom UIs

Check the screen resolution they are using and the prompting strategy

Mar 07 '25 15:03 Blaizzy

i checked that, they're resizing, but still that doesn't helps me @Blaizzy any guidance you could provide with?

also it fails on the same https://maxiw-os-atlas.hf.space/gradio_api/file=/tmp/gradio/fc0a8ef05b952a970924913eea2b89a8d626c92031f94ff5f3e6ccaf8dd23a4e/web_6f93090a-81f6-489e-bb35-1a2838b18c01.png attached here The y index specifically

Mar 08 '25 06:03 prncvrm

Qwen2vl needs to normalise their bbox to 1000

Mar 08 '25 06:03 Blaizzy

i've been doing that @Blaizzy i do get the x axis correct, just the y axis is wrong

Mar 08 '25 15:03 prncvrm

also when you say you've tried on the demo samples here https://github.com/Blaizzy/mlx-vlm/issues/229#issuecomment-2706806607, did you mean you've used the original hugging face model, or a converted via mlx one?

Mar 08 '25 17:03 prncvrm

this is from hugging face spaces logs vs

this is using mlx

can you see if you can help by any chance?

Mar 08 '25 17:03 prncvrm

btw, i've kept the original size of image on huggingface spaces, modified the code to keep the original image you can see it in the logs

Mar 08 '25 17:03 prncvrm

@Blaizzy any directions you could help me with?

Mar 10 '25 08:03 prncvrm

same happens for https://huggingface.co/osunlp/UGround-V1-7B this too, maybe i m missing out something?

Mar 10 '25 10:03 prncvrm

@Blaizzy any pointers?

Mar 11 '25 08:03 prncvrm

@Blaizzy i've been struck forever on this, any direction you could help me with?

Mar 12 '25 10:03 prncvrm

Hey Prince

I have my plate full,

I have given you all the pointers you need. This is a great opportunity for you to learn.

From the images you shown above the bouding boxes between HF and MLX have a high IoU (Intersection over Union) of more than 60% which is good.

Mar 12 '25 11:03 Blaizzy

I'm closing this issue for now.

Please feel free to re-open if and when you found a bug/problem with MLX that is reproducible.

Mar 12 '25 11:03 Blaizzy

just one thing, the IOU, is this expected between HF n MLX? @Blaizzy

Mar 13 '25 05:03 prncvrm

https://github.com/huggingface/transformers/blob/87b30c35892568f9b83d4e8d1233956b8e0cd96c/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1708 i believe we're not calculating ROPE index's in MLX-VLM, which is causing the problem, once i comment out this section in transforms, i get the same issue there as well @Blaizzy BTW

Mar 13 '25 11:03 prncvrm

@Blaizzy can you re-open this issue, there seems to be issue as stated above. Im trying on work on the patch meanwhile

Mar 17 '25 11:03 prncvrm

@Blaizzy also can you review the issue is correct ?

Mar 18 '25 06:03 prncvrm

@Blaizzy https://github.com/Blaizzy/mlx-vlm/pull/319 can you review this PR this has been handled now

Apr 21 '25 14:04 prncvrm