EVE
EVE copied to clipboard
Finetune on downstream tasks
Hello,
how can this model be further finetuned to downstream tasks such as object localization?
Good questions.
For object localization tasks, we suggest directly outputting the bounding box coordinates.
Two important notes are: (1) Replace the original 1D RoPE with 2D RoPE to better capture spatial relationships. (2) Use dynamic resolution by feeding the actual input image dimensions when representing bounding boxes, points, and other spatial features. This helps the model inherently learn scale information, improving its ability to handle images at different resolutions.