keras-io icon indicating copy to clipboard operation
keras-io copied to clipboard

ViT cannot detect multiple objects in one image

Open galax-count opened this issue 2 years ago • 3 comments

The Object Detections with Vision Transformers can only detect one object per image. I tried to run the model prediction on an image containing many same objects, only 1 big bounding box covering all objects is drawn, instead of 1 bounding box per object. Please correct me if I am wrong.

galax-count avatar Jul 28 '22 06:07 galax-count

The model is designed to output a single bounding box (the last layer being a dense layer with 4 units layers.Dense(4)(features) that output the 2 sets of coordinates for the top left and bottom right corners of the predicted bounding box).

So the model will always spit out only one bounding box, no matter how many objects you have in your picture.

quentinfayet avatar Sep 26 '22 17:09 quentinfayet

I believe this example is misleading and borderline dishonest. It is not an object detection model. Such examples should be removed from the official repository.

alekseisolovev avatar Mar 07 '23 10:03 alekseisolovev