ScanRefer icon indicating copy to clipboard operation
ScanRefer copied to clipboard

A utterance refer to a more than one object

Open linhaojia13 opened this issue 2 years ago • 0 comments

As can be seen below, in the scene scene0011_00 which is in the val split, the utterance for one chair is This is a brown chair. There are many identical chairs setting around the table it sets at. Obviously, there are at least 4 chairs that match this utterance. Such ambiguous descriptions in the training set may provide some supervision signals to facilitate the model's learning of vision-language alignment, but encountering such ambiguous descriptions in the validation set does not help us evaluate the model's performance.

图片

linhaojia13 avatar May 15 '23 06:05 linhaojia13