torch-gqn icon indicating copy to clipboard operation
torch-gqn copied to clipboard

How long does training usually take

Open 11lucky111 opened this issue 1 year ago • 1 comments

Whether I use 2xNVIDIA GeForce RTX 2080 Ti or 4xTeslaV100-SXM2-32GBor 8xTeslaV100-SXM2-32GB, it takes about 20 days to train shepard_metzler_5_parts. I want to know how long it usually takes to train for shepard_metzler_5_parts.

11lucky111 avatar Oct 14 '23 08:10 11lucky111

[Edit: Training the model on the Shepard Metzler dataset for 300k iterations should be enough. These are the performances I remember getting:

  • 12 layers and shared_core=false:
    • +/- 3it/s with an rtx 3090
    • +/- 5it/s with a tesla v100 (google colab)
  • It took about 24 hours to train the model for 300k steps with 12 layers and shared_core=true with an rtx 2080, observed similar speeds with an rtx 3070. ]

There is a small mistake in almost all implementations of the GQN model which causes the model to have an incredible amount of parameters. You'll see a shared_core parameter (in the GQN constructor iirc) and I would suggest setting it to true. Setting it to false causes it to make a separate VAE for each time-step, which increases the number of parameters dramatically, and consequently the training time.

This was not how the DRAW (and ConvDRAW) model was designed (the GQN generator is a ConvDRAW model). DRAW is a recurrent model that generates images in a fixed number of steps, and requires the hidden state of the previous time-step as input. Setting shared_core to false creates 1 ConvDRAW generator and uses it recurrently, as intended. Setting it to true just makes a chain of VAE models, which is not a ConvDRAW model.

I am not sure if this was intended or not by the repository authors.

Note: @lihao11 My thesis was about this model, and made a modification where it's possible to create a multi-layer GQN generator, where each layer acts as a proper RRN. It's also possible to set a separate number of time-steps per layer and a resolution scaling (i.e. layer 1 generates low res, layer 2 doubles resolution, ...). If you need it, I can ask my uni if I'm allowed to make the code public.

Eagle-E avatar Oct 29 '23 19:10 Eagle-E