Detectron.pytorch icon indicating copy to clipboard operation
Detectron.pytorch copied to clipboard

Grabbing Features for an SVM

Open B2Gdevs opened this issue 5 years ago • 8 comments

Hey everyone, I am trying to get the features used for the masks. I have looked in roi_feature_transform and mask_net in model_builder.py

However, I don't understand the output of the "mask_feat" in the mask_net method.

My current understanding is that roi_feature_transform will give the features used in bounding boxes and masks. Now the mask predictions come out to be a shape (num_predictions, num_classes, 14, 14) 14 being resolution.

But the features are (num_predictions, 256, 14, 14) if flattened this would be 256 * 14 * 14 which seems way too high.

I am 100% positive I am misunderstanding something and would like to know how I can extract the features and map them to the correct label.

Thank you for your time.

B2Gdevs avatar Oct 19 '18 00:10 B2Gdevs

256 * 14 * 14 seems correct to me. It's definitely a large number for an SVM, but it's not an uncommon size for an intermediate CNN layer. You could access a later stage in the mask branch, which might have a slightly smaller feature size, but probably on the same order of magnitude.

achalddave avatar Oct 25 '18 03:10 achalddave

It seems the paper is showing that with the resnet backbone that after this stage the masks occur. screenshot from 2018-10-19 16-43-14

I believe I was misunderstanding the features that propose the masks. I believe the ROI features found in model_builder.py with the variable name "box_feats" are actually what I needed, their shape is (num_predictions, 2048, 1, 1) which would give me 2048 features when flattened. That's much more reasonable.

However, thanks for the feedback. Any other suggestions would be great on training an SVM. I cannot retrain the softmax layer for the life of me and am resorting to the SVMs right now.

B2Gdevs avatar Oct 28 '18 19:10 B2Gdevs

@B2Gdevs Hi, I try to extract the 2048 features for given specific bounding boxes, can you share how to do that?

yangshao avatar Jan 01 '19 16:01 yangshao

@yangshao

Yeah, no problem. if you look at the model_builder.py file you will see the variable "box_feats" those are the features. Now to maintain those features you just need to follow the same format as the cls_scores. Which the 'cls' part means "classification". The index in any 'cls' is the same as the class id. So if 'dog' has a class id that is 10, then you would find that the scores for those boxes would be at the 10th index.

Now it gets weird to maintain that, but if you follow the code you will notice that in 'model_builder.py' the variable 'return_dict' is a dictionary. Just insert the features in to that.

Now the rest I believe is in the method 'im_detect_all' found in the test.py file. Follow how that method is returning the scores and reformat the features to be in the same format as the scores. It should be relatively easy since it is shown with the scores (it took me 8 hours to find the features and reformat and save).

Now since you will most likely be doing inferencing you will need to modify the test_engine.py file to fit your new modifications. That is where I save the features, scores, etc to HDF5 files.

Good luck.

B2Gdevs avatar Jan 01 '19 21:01 B2Gdevs

@B2Gdevs Thanks for your quick reply. But it seems there is a little bit confusing. The thing I want to do is during inference, given an image and any bounding box(not through detection, but i specify it. And I don't know the label information of this bounding box), I hope to get the feature representation of this bounding box. I don't quite the class part.

yangshao avatar Jan 01 '19 21:01 yangshao

@yangshao If you are saying you want to get the features of a bounding box that is already there you will have to modify Detectron in the way I stated before. Then you will have to save the features and detections associated with those features. Then you will have to use the Intersection over Union (IoU) of your ground truth bounding boxes along with the detections to determine which detection best matches each bounding box in your ground truth dataset. Then just make a new dataset with the features of the detections with the best detections and their features.

You state you don't understand the class part. However, when you begin modifying Detectron to do what you are asking you will understand. If you find any variable that begins with 'cls' then you know that the format of that variable is based on the class IDs in your dataset.

For example 'dog' has a class ID of 10 therefore every detection that has predicted to be a 'dog' will be in the 10th index of any variable beginning with 'cls'.

It is very important to understand that to being modifying Detectron to do what your wanting.

So the steps to do what you want is.

1. Modify Detectron as described in my previous comment to save features of detections 2. Run inference on the images you have ground truth for and then use IoU to match the best detections to those ground truth labels 3. Save what you found in step 2 to a new dataset 4. Classify the new dataset using any means you want, SVM, NN, KNN, etc...

B2Gdevs avatar Jan 02 '19 16:01 B2Gdevs

@B2Gdevs This solution seems weird to me? so you assume the model has to predict a bounding box close to a given bounding box? what if the model cannot... For example, the given bounding box's class is not in the training class.

yangshao avatar Jan 02 '19 16:01 yangshao

@yangshao Then you would have to hope that it is similar to the objects that the model has been trained on. e.g. if your model was trained on dogs alone and you have a ground truth bounding box of a cat. You will have to hope that your model will detect that it is an object and give you a bounding box for that region.

Detectron doesn't create features for ground truth bounding boxes, it creates features from images that generate bounding boxes in the RPN that the ground truth bounding boxes are tested against. If you want features for your bounding box then you will have to modify detectron more heavily and track what region features are coming from or do the steps I listed before.

However since you are just wanting features, it is probably best just to not use Detectron and use a pretrained CNN and just crop the image to the bounding box when you feed it to the CNN.

B2Gdevs avatar Jan 10 '19 23:01 B2Gdevs