keras-io
keras-io copied to clipboard
ViT cannot detect multiple objects in one image
The Object Detections with Vision Transformers can only detect one object per image. I tried to run the model prediction on an image containing many same objects, only 1 big bounding box covering all objects is drawn, instead of 1 bounding box per object. Please correct me if I am wrong.
The model is designed to output a single bounding box (the last layer being a dense layer with 4 units layers.Dense(4)(features)
that output the 2 sets of coordinates for the top left and bottom right corners of the predicted bounding box).
So the model will always spit out only one bounding box, no matter how many objects you have in your picture.
I believe this example is misleading and borderline dishonest. It is not an object detection model. Such examples should be removed from the official repository.