recognize-anything
recognize-anything copied to clipboard
How to finetune the RAM++ using object detection dataset without image caption data
I have an object detection dataset that only contains bbox and class annotations. How can I use this dataset to train RAM++?