moondream
moondream copied to clipboard
Questions about the visual grounding task
The format of dataset about the visual grounding task is not described clearly in your repo. Can you give some clear instructions? Or, How should I fine-tune the model on my own dataset about visual grounding task.