mac-network-pytorch image_features.py output shape?

Hi! This is probably a gap in my understanding of pyTorch or h5py, but I wanted to bring it your attention just in case it’s not.

The output of image_features.py is a batch_size*(num images in split) x 1024 x 14 x 14 numpy array. You assign the features associate with each image to batch_size continuous indices in a slice of the first index. I don’t understand why it’s necessary to store batch_size copies of each image’s features.

Later, when you load the data from the h5 file in the CLEVR dataloader’s getitem method in dataset.py, you index the array as if img[i] gives the features of the ith image. But based on how you initialized the h5 file, these would actually be stored [batch_size*i:batch_size(i+1)], not i.

What am I missing here?

Apr 15 '19 16:04 bpiv400

image_features.py extract features in batch sense, that is, extract features of batch of images and inserts batch of features into hdf5 file. So it is not the batch numbered copies of the image features.

Apr 15 '19 16:04 rosinality

Got it. Sorry I missed that. Why do you start word embedding indices at 1, instead of at 0?

Apr 15 '19 18:04 bpiv400

For zero pad question sequences. I have used packed sequences in this case, though.

Apr 15 '19 23:04 rosinality