tf-gqn icon indicating copy to clipboard operation
tf-gqn copied to clipboard

GQN trained on CLEVR dataset

Open loganbruns opened this issue 5 years ago • 10 comments

Thanks for the GQN implementation. I thought you might enjoy seeing some pictures of how it does trained on a different dataset. (Albeit with a limited amount of training time. I plan to train for longer.)

Screen Shot 2019-06-11 at 6 04 24 AM

Even on the test set it works pretty well with a relatively small amount of training. Seems to generalize better than on the flat shaded deepmind dataset.

image

I'm curious what kind of changes you might be interested in via pull request? I have some changes to the training parameters and also I've found self-attention to improve the speed of generalization and training in general. However that wasn't in the original paper.

Thanks, logan

loganbruns avatar Jun 16 '19 20:06 loganbruns

Hi logan, Your results looks great. May I know what is the dataset size and how long do you train the model?

Chan

waiyc avatar Jun 19 '19 02:06 waiyc

@waiyc, approximately 100k iterations on ~15k training examples. Not as long as I'd have liked nor with as much data as I'd liked. I'm thinking of generating more data and retraining. Maybe at the size of the original CLEVR dataset which was significantly larger. (Waiting on some more disks.)

loganbruns avatar Jun 19 '19 04:06 loganbruns

@loganbruns That looks great, thank you for sharing these results! :) I'd be very happy to include a data loader for the CLEVR dataset (either from raw files or from pre-processed tfrecords). I'm currently in the middle of updating the data loader to a more stable and tf 1.12.1 compatible version. The update should be online within the week. So feel free to send a pull request for a CLEVR data loader. It should live under data_provider/clevr_provider.py and be modelled after the updated gqn_provider.py I'm also very interesed in (self-) attention mechanisms for the model since they were used in follow-up papers like the localization and mapping one. I'm happy to discuss this on a separate issue thread.

ogroth avatar Jun 19 '19 10:06 ogroth

@ogroth, thanks for the reference. I'll read it. I also created a separate issue to discuss perhaps merging some of the changes. Regarding CLEVR, since I had to modify the dataset generation I also added to the dataset generation changes code to convert it into the deepmind dataset format. I was thinking of asking them if they'd take some of the changes so others could use their generator to generate for GQNs. That is what I was thinking at least.

loganbruns avatar Jun 19 '19 14:06 loganbruns

@loganbruns would you mind sharing the conversion code that you have used to convert CLEVR dataset to GQN dataset tfrecords format, I am also creating my own dataset and still struggling to understand the GQN dataset format to make it work with this implementation.

phongnhhn92 avatar Jun 21 '19 12:06 phongnhhn92

Hi @loganbruns , the new input pipeline is now in master. Would you mind modelling your input_fn for CLEVR after this one? Also, you can include data generation and conversion code for CLEVR under data_provider. I'm happy to review your pull request. :)

ogroth avatar Jun 21 '19 13:06 ogroth

@phongnhhn92, here is the source:

https://github.com/loganbruns/clevr-dataset-gen/blob/clevr_gqn/image_generation/convert_gqn.py

Just let me know if you have any questions.

loganbruns avatar Jun 22 '19 23:06 loganbruns

@ogroth , thanks. I'll take a look.

loganbruns avatar Jun 22 '19 23:06 loganbruns

@loganbruns From your convert_gqn.py I can see that you saved each scene with N number of frames as one TFrecord. As you mentioned you trained the model with 15k training example, so you generated 15k scenes/.tfrecord as training data.

Is my understanding correct?

waiyc avatar Jun 26 '19 00:06 waiyc

@waiyc , yes, 15k scenes each with N number of frames. I generated a file per train, val, and test. The train tfrecord file had 15k scenes. For the deepmind dataset each tfrecord file has 5k scenes.

loganbruns avatar Jun 26 '19 07:06 loganbruns