TRACE Performance very different to Action Genome baselines

Performance very different to Action Genome baselines

Open zyong812 opened this issue 3 years ago • 3 comments

Thanks for sharing the nice work!

But I find the performance presented in the paper is very different with methods in "Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs" and "detecting human-object relationships in videos". What are the causes for this?

Oct 20 '21 14:10 zyong812

Actually, when we began this project, we could not reproduce the performance in "Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs". Our model always outperform them a lot.

This is not an individual case. You may refer to this paper: "Spatial-Temporal Transformer for Dynamic Scene Graph Generation", where a similar phenomenon is found.

As for "detecting human-object relationships in videos", I haven't read it yet. I'll reply if I find some clues.

Oct 21 '21 01:10 tyshiwo1

OK, thanks for replying.

Oct 21 '21 02:10 zyong812

I think of one possible situation. AG is actually a HOI dataset. However, the metrics in SGG such as PredCls and SGCls enumerate all possible object pairs (e.g. <shoe, bed> in a scene of person, shoe and bed).

However, our setting is in line with RelDN repo for AG dataset, thereby restricting the subject to be the person, but not for VidVRD.

Oct 22 '21 06:10 tyshiwo1

TRACE TRACE copied to clipboard

Performance very different to Action Genome baselines

TRACE
TRACE copied to clipboard