multi_object_datasets icon indicating copy to clipboard operation
multi_object_datasets copied to clipboard

Number of objects in CLEVRwithmasks scenes

Open vadimkantorov opened this issue 3 years ago • 9 comments

All fields seem to be padded with zeros up to 11 objects. How to find out the true number of objects?

vadimkantorov avatar Mar 02 '21 18:03 vadimkantorov

What are valid integer codes for color and materials? non-zero? Colors and materials are encoded as uint8 in CLEVRwithmasks and with strings in CLEVR.

vadimkantorov avatar Mar 02 '21 18:03 vadimkantorov

An example of dump that I get:

{
'color': [0, 1, 2, 3, 1, 1, 4, 5, 0, 0, 0],
'material': [0, 1, 1, 2, 2, 1, 2, 2, 0, 0, 0], 

'shape': [0, 1, 2, 1, 1, 3, 3, 3, 0, 0, 0],
'size': [0, 1, 1, 2, 2, 2, 2, 1, 0, 0, 0], 
'visibility': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0], 

'pixel_coords': [[0.0, 0.0, 0.0], [216.0, 92.0, 11.397212982177734], [184.0, 127.0, 9.41761589050293], [116.0, 81.0, 13.153035163879395], [51.0, 121.0, 10.44654655456543], [123.0, 129.0, 10.018261909484863], [36.0, 109.0, 11.129423141479492], [160.0, 176.0, 7.559253692626953], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 'rotation': [0.0, 206.87570190429688, 158.63943481445312, 330.7286071777344, 31.30453109741211, 198.59092712402344, 2.6792359352111816, 243.39840698242188, 0.0, 0.0, 0.0],
'x': [0.0, 0.7548010349273682, 1.6617735624313354, -2.911912679672241, -1.656102180480957, 0.1737128645181656, -2.6883020401000977, 2.8262786865234375, 0.0, 0.0, 0.0], 
'y': [0.0, 1.7722225189208984, -0.6132868528366089, 0.35031434893608093, -2.893202543258667, -1.555863618850708, -2.886444568634033, -2.4974899291992188, 0.0, 0.0, 0.0], 
'z': [0.0, 0.699999988079071, 0.699999988079071, 0.3499999940395355, 0.3499999940395355, 0.3499999940395355, 0.3499999940395355, 0.699999988079071, 0.0, 0.0, 0.0]
}

As you can see, the first object seems to have zeros everywhere, but visibility is 1.0 :/

vadimkantorov avatar Mar 02 '21 18:03 vadimkantorov

Hi Vadim,

The first object has all-zero attributes (color, material, shape, and size) because it represents the background. As you may have observed, the first segmentation mask (for any scene) contains the background pixels.

The mapping from integers to words for CLEVR features is as follows:

{
  "material": {"metal": 2, "rubber": 1}, 
  "size": {"large": 1, "small": 2}, 
  "color": {"cyan": 2, "red": 1, "brown": 5, "gray": 6, "purple": 7, "yellow": 8, "blue": 4, "green": 3}, 
  "shape": {"cube": 3, "sphere": 1, "cylinder": 2}
}

And you can find the number of visible objects in any scene using the visibility vector. Note that it codes both the background and foreground objects as 1.0.

Hope this helps, Rish

rishabhkabra avatar Mar 02 '21 22:03 rishabhkabra

Thanks! It would help adding these to README!

I've got another question: how were train/test splits done? (for both CLEVR6 and CLEVR10) Could you provide the file lists?

vadimkantorov avatar Mar 02 '21 22:03 vadimkantorov

In Multi-Object Representation Learning with Iterative Variational Inference, we used only images containing 3-6 visible foreground objects (inclusive range) to train our model i.e. CLEVR6. We then assessed the model's generalization to the full dataset (where scenes could contain up to 10 objects).

You can construct the train split from CLEVR (with masks) by writing a filtering function which returns True when sum(visibility) <= 7. Sorry it won't be possible to provide an exact file list.

rishabhkabra avatar Mar 02 '21 22:03 rishabhkabra

Do I understand correctly that sum(visibility) <= 7 was train and sum(visibility) > 7 - test? Or did test also contain some (or all?) of sum(visibility) <= 7 images?

How many images were in train/test?

Basically, I'm trying to figure out the object discovery evaluation setup for Slot Attention which I think matched your setup (https://github.com/google-research/google-research/issues/595)

Thank you!

vadimkantorov avatar Mar 02 '21 22:03 vadimkantorov

CLEVR6 := sum(visibility) <= 7, whereas CLEVR10 was the whole dataset (any number of visible objects). That should reflect the terminology in the Slot Attention paper.

rishabhkabra avatar Mar 02 '21 22:03 rishabhkabra

Does test split ensure it doesn't intersect too much with train? E.g. are train images excluded? Are there any filtering wrt object properties? Do you have somewhere still sizes of train/test? maybe in comments inside the arxiv submission? :)

vadimkantorov avatar Mar 02 '21 23:03 vadimkantorov

Is it true that:

  • first 70k examples are used for train (and further filtered to contain <=6 objects). all of them are used in training
  • remaining 30k examples are used for test (and further filtered to contain <=6 objects or <= 10 objects). from the filtered 320 examples are sampled uniformly

?

vadimkantorov avatar Mar 03 '21 00:03 vadimkantorov