Evaluation and Finetuning Scripts

Open srvmishra opened this issue 7 months ago • 1 comments

Hello @jwyang. Thank you for the amazing work.

I am trying to finetune magma model on the following datasets for now -

I also want to evaluate the resulting finetuned model on the test splits of the above datasets. For that, I need the following -

Input and output format of magma should be configured in a way that is compatible with the above datasets.
Annotating the UI images with SOM.
Compatibility between the evaluation script of the above datasets and the magma model.

Here is what we are doing -

Towards managing the output we have thought of taking structured output from magma in json format by applying specific prompt templates. We will parse this structured output using some python libraries and use it later to take actions. From the inference codes, I feel the input to the magma model is in the same way we would input to any other VLM.
I was able to write a code for annotating UI images with SOM markers following your comments to another issue som but I did not change the default parameter values. I observed that there were multiple boxes for the same element while there should actually have been only one. Since you have most likely used a similar SoM implementation in annotating the Mind2Web data that you have made available at huggingface, could you share the SoM parameter values with us? Or else, if there is any new version of the SoM generation script, could you share that with us? We want to start finetuning from the raw data itself.
While Mind2Web provides its evaluation scripts, we still doubt whether we could simply plug in the finetuned magma model into it or else finetune magma following the instructions given at Mind2Web.

We would like your opinions on the above matters. It would be really kind of you to release the finetuning code along with data preparation instructions. That would really clear many things for us. Could you also provide some simple tutorials on trainer and data loader files explaining the usage of each function. That will help us create our custom pipelines on top of magma.

May 16 '25 06:05 srvmishra

I am also interested in the questions raised.

May 19 '25 09:05 0bi0n3