ContrastiveLosses4VRD [Question] SGDet vs SGCls (VG)

[Question] SGDet vs SGCls (VG)

Open sharifza opened this issue 4 years ago • 4 comments

I have a question. I don't understand why (in Visual Genome) SGDet gains such a small improvement compared to Neural Motifs whereas SGCls has gains such a larger improvement? Isn't the only difference in the region proposal network?

Jan 03 '20 10:01 sharifza

Now I understand that your reported numbers are in fact not comparable to Neural Motifs. I consider this some sort of [unintended?] mistake in reporting the results.

In NM (and most of the previous works), SGCls is defined as a setting where bounding boxes are given, while edges are not, and we evaluate the quality of "detected" and "classified" edges. In your work, you have updated the definition of SGCls to a setting where bounding boxes and edges are given and the goal is to evaluate the quality of "classifying" edges. While I understand your motivation behind this change (given the name "Scene Graph Classification"), putting these under the same title in the table, will totally mislead the community.

Apr 02 '20 16:04 sharifza

@sharifza if you could share the code fixing the evaluation of the models in this repo, it would be great! I still see they rank triplets here https://github.com/NVIDIA/ContrastiveLosses4VRD/blob/master/lib/datasets_rel/task_evaluation_vg_and_vrd.py#L84, so I'm not sure where exactly their evaluation goes wrong.

Nov 11 '20 19:11 bknyaz

@bknyaz I avoided using this repository for my research. No one responded to my complaint for a year. The mentioned evaluation issue affects the heart of this paper's contribution and questions the validity of everything. There are other repositories that I recommend you to take a look at: Neural Motifs [PyTorch 0.3], Depth-VRD (Neural Motifs [PyTorch > 1.0]), and the recent benchmark by @kaihuatang. Kaihua also pointed out this issue here. (Two Common Misunderstandings in SGG Metrics).

Jan 02 '21 17:01 sharifza

The main problem is that the evaluation for VRD and VG is done in the same file even if the metrics are slightly different. The metrics used in VRD are the following:

predicate detection (PredDet): predicate prediction given a pair of localized objects (both bounding boxes and labels);
phrase detection (PhrDet): locate the phrase (subject, predicate, object) in the image with a unique bounding box;
relationship detection (RelDet): define the triplets (subject, predicate, object) with a pair of bounding boxes.

The metrics used in VG are:

predicate classification (PredCls): predict the relationships (edges) among object pairs given a set ground-truth bounding boxes and labels;
phrase classification (PhrCls) or scene graph classification (SGCls): predict the triplets (subject, predicate, object) (edges and labels) given a set of localized objects;
scene graph generation (SGGEN) or (SGDet): predict the bounding boxes and the triplets in the image, an object is considered correct if it has at least 0.5 IoU overlap with the ground-truth bounding box.

In PredDet, the pairs (subject, object) are given as pointed in this issue, whereas in PredCls and SGCls are not. This is the problem related to this implementation.

Hope this helps! 👍

Apr 29 '21 16:04 sigeek

ContrastiveLosses4VRD ContrastiveLosses4VRD copied to clipboard

[Question] SGDet vs SGCls (VG)

ContrastiveLosses4VRD
ContrastiveLosses4VRD copied to clipboard