SGDiff
SGDiff copied to clipboard
scene graph encoding choice
From paper, it seems that scene graph is in form of text triplet and you encode the text triplet using Graph encoder. Is my assumption true or image features is also used for Scene graph encoding? if yes what kind of graph model you are using to encode textual scene graph information. From code it seems like BERT is used for processing of text in the scene graph.