OmniParser icon indicating copy to clipboard operation
OmniParser copied to clipboard

does the model support fine-tuning?

Open bolt163 opened this issue 5 months ago • 3 comments

as title says,
if someone wants fine-tune the model, what's the process ?

bolt163 avatar Jul 17 '25 06:07 bolt163

OmniParser uses florence2 (or blip2-opt-2.7b) for captioning details and a custom trained yolo model for the icon detection. you can finetune both models as you like. I e.g. just trained the yolo model on my custom dataset. but be aware there is only one class embedded and thats the icon label.

you can then replace the yolo *.pt model in the weights/icon_detect folder with your own fine tuned one. for the training, just use the train_args.yaml provided and override stuff like device and batch.

same for the florence2 model. finetune and replace the *.safetensors in weights/icon_caption_florence .

i can provide some code if needed.

tuke307 avatar Jul 17 '25 15:07 tuke307

OmniParser uses florence2 (or blip2-opt-2.7b) for captioning details and a custom trained yolo model for the icon detection. you can finetune both models as you like. I e.g. just trained the yolo model on my custom dataset. but be aware there is only one class embedded and thats the icon label.

you can then replace the yolo *.pt model in the weights/icon_detect folder with your own fine tuned one. for the training, just use the train_args.yaml provided and override stuff like device and batch.

same for the florence2 model. finetune and replace the *.safetensors in weights/icon_caption_florence .

i can provide some code if needed.

hi tony, thanks very much

bolt163 avatar Jul 18 '25 02:07 bolt163

if you can provide an example would be great. Thank you

directorscut82 avatar Aug 04 '25 15:08 directorscut82