taming-transformers icon indicating copy to clipboard operation
taming-transformers copied to clipboard

When will custom training stop?

Open soon-yau opened this issue 2 years ago • 1 comments

I have just started training on 200k 256x256 custom dataset and I'm surprise to see it requires 31 hours to train 1 epoch on three 16GB fast GPUs. So my questions are:

  • When will it stop? After certain epoch or after loss no longer improve?
  • Can you give some rough idea on how long it took to train the pretrained models?

soon-yau avatar Nov 24 '21 14:11 soon-yau

@soon-yau Taming Transformers for High-Resolution Image Synthesis paper says NLL validation loss is minimum at epoch 10 ~ 15 and NLL training loss is minimum at 1000 epochs.

E. Nearest Neighbors of Samples
One advantage of likelihood-based generative models over, e.g., GANs is the ability to evaluate NLL on training data and
validation data to detect overfitting. To test this, we trained large models for face synthesis, which can easily overfit them,
and retained two checkpoints on each dataset: One for the best validation NLL (at the 10th and 13th epoch for FFHQ and
CelebA-HQ, respectively), and another for the best training NLL (at epoch 1000). We then produced samples from both
checkpoints and retrieved nearest neighbors from the training data based on the LPIPS similarity metric [81]. The results
are shown in Fig. 45, where it can be observed that the checkpoints with best training NLL (best train NLL) reproduce the
training examples, whereas samples from the checkpoints with best validation NLL (best val. NLL) depict new faces which
are not found in the training data.

So I am training mine with 12 epochs referring to the paper.

If you want to set maximum epoch, simply add --max_epochs for the command line interface as the argument for python main.py. pytorch-lightning trainer will receive the argument and set maximum epoch based on it.

snoop2head avatar Dec 12 '21 07:12 snoop2head