unleashing-transformers icon indicating copy to clipboard operation
unleashing-transformers copied to clipboard

How to train with custom dataset?

Open eeyrw opened this issue 2 years ago • 3 comments

Any hints or tips?

eeyrw avatar Sep 30 '22 09:09 eeyrw

Take a look in set_up_hparams.py and data_utils.py.

If you set --dataset=custom and provide a path using the --custom_dataset_path flag during both training stages that should provide the functionality you need.

Let us know how that goes, happy to help if you get stuck with any difficult issues!

peterhessey avatar Sep 30 '22 09:09 peterhessey

Another way to use your own dataset is to add it to datasets.yml along with the relative path to where the dataset is stored, and then provide the name set in the YML file to the --dataset flag.

you'll need to make sure your dataset is of the correct resolution for the architectures you're training. The default architectures for FFHQ and LSUN are set up for 256x256 images so I'd recommend starting there before moving to other resolutions.

peterhessey avatar Sep 30 '22 09:09 peterhessey

Thanks for your information. After a simple trial I found that there is something missing finally leading failure when using custom dataset.

  1. Missing parameter to get_data_loaders https://github.com/samb-t/unleashing-transformers/blob/40a243275048e7eb9d753e1518f582f59f2686a8/train_vqgan.py#L22 Here the parameter custom_dataset_path does not pass to get_data_loaders
  2. Class HparamsVQGAN has no default config for custom dataset https://github.com/samb-t/unleashing-transformers/blob/40a243275048e7eb9d753e1518f582f59f2686a8/hparams/defaults/vqgan_defaults.py#L15

The source train_sampler.py has same issue.

eeyrw avatar Sep 30 '22 15:09 eeyrw