VL-PLM Accuracy with classes

Hi, I often see 'Giraffe' or other wild animals appear around ~70% when inferencing on images. I would just like to confirm I am using the correct class set (COCO_CATEGORIES from https://github.com/xiaofeng94/VL-PLM/blob/main/VL_PLM/data/datasets/coco_util.py )

Using pretrained LSJ model.

Here's an example with Bear @ 67% The above is the crop of the bounding box that was labelled Bear (21).

And here is a Giraffe (23):

As an aside, I realise flags may be an issue. In other models they are often identified as 'Kite'. Giraffe/Bear is a new one though and it seems to pop up a lot for all sorts of images.

It really seems like a mapping error on my side. I tried a picture of fruit and vegetables to try and learn the mapping errors. A picture of Broccoli and Celery says 'Cup'. Carrot and Capsicum says 'Fork', Pumpkin and Lemon says 'Wine Glass', Apple says 'Tennis Racket' and Potato is 'Baseball Bat'.

Sep 28 '22 09:09 xsacha

Hey, which script did you use for your experiments?

Giraffe/Bear is base categories with ground truth in COCO zero shot setting. It may be the case that the model learns to predict more Giraffe/Bear. For your two examples, I think the scores are not quite high, which means the model is not confident to make predictions for them

Sep 28 '22 15:09 xiaofeng94

OK, I think I got it now. When I run the eval in the python script against COCO, it says: [09/29 09:18:20 d2.data.build]: Distribution of instances among all 65 categories:

I thought these were just the classes used for the eval, but these are the only classes the model returns too. When I change the mapping to use these classes, I get the correct results back. Bear is actually 'Umbrella'.

The model just does 65 classes? How do we get access to the full 200 classes or unseen classes?

Sep 28 '22 20:09 xsacha

To evaluate on new classes, you need to change the text embeddings used in the detector. In our codebase, the text embeddings are provided in .json file, so you may make your own .json files with those new classes.

BTW, models trained with PLs for COCO novel categories may not generalize well to other unseen categories. You may get better results if you generate PLs for more categories and train a detector. You may refer to Table 2 and corresponding sections in our paper.

Sep 29 '22 14:09 xiaofeng94

@xiaofeng94 I was wondering if you could provide more information about how to change the text encoder. The ./datasets/coco/annotations/open_voc/instances_eval.json contains the classes I want already and this is the file referenced by the yaml config but it does not get used.

Do I just modify 'BASE_CATEGORIES' and 'EVAL_CATEGORIES' to some subset of the text embeddings present in the JSON? I tried this and received a 'key_error' in the eval.

Dec 01 '22 04:12 xsacha

@xsacha Thanks for your interest in our work. The get_embedding function for the dataset is implemented here. So if you directly change the label space to a subset of the original one, you will need to remap the embedding to match your current label space. Modification of the files dataset.py and config should be required. Please let us know if you have further questions.

Dec 02 '22 22:12 zhang-zx

VL-PLM VL-PLM copied to clipboard

Accuracy with classes

VL-PLM
VL-PLM copied to clipboard