Dreambooth-Stable-Diffusion icon indicating copy to clipboard operation
Dreambooth-Stable-Diffusion copied to clipboard

test_dataloader

Open thomasf1 opened this issue 2 years ago • 4 comments

After passing 2 Epochs, I am getting this error:

pytorch_lightning.utilities.exceptions.MisconfigurationException: No test_dataloader() method defined to run Trainer.test.

Here is some more context:

Epoch 0, global step 499: val/loss_simple_ema was not in top 1
Epoch 0: 100%|█| 505/505 [09:33<00:00,  1.14s/it, loss=0.276, v_num=0, train/loss_simple_step=0.0151, train/loss_vlb_step=6.71e-5, Average Epoch time: 573.97 seconds                                                                                                   
Average Peak memory 35456.11MiB
Epoch 1:   0%| | 0/505 [00:00<?, ?it/s, loss=0.276, v_num=0, train/loss_simple_step=0.0151, train/loss_vlb_step=6.71e-5, train/loss_Data shape for DDIM sampling is (1, 4, 64, 64), eta 1.0
Running DDIM Sampling with 200 timesteps
DDIM Sampler: 100%|███████████████████████████████████████████████████████████████████████████████| 200/200 [00:22<00:00,  8.96it/s]
Data shape for DDIM sampling is (1, 4, 64, 64), eta 1.0███████████████████████████████████████████| 200/200 [00:22<00:00,  8.96it/s]
Running DDIM Sampling with 200 timesteps
DDIM Sampler: 100%|███████████████████████████████████████████████████████████████████████████████| 200/200 [00:29<00:00,  6.78it/s]
Epoch 1:   0%| | 1/505 [00:59<8:19:19, 59.44s/it, loss=0.275, v_num=0, train/loss_simple_step=0.0144, train/loss_vlb_step=6.2e-5, tr[W accumulate_grad.h:185] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [320, 320, 1, 1], strides() = [320, 1, 1, 1]
param.sizes() = [320, 320, 1, 1], strides() = [320, 1, 320, 320] (function operator())
Epoch 1:  59%|▌| 300/505 [06:13<04:15,  1.24s/it, loss=0.245, v_num=0, train/loss_simple_step=0.0778, train/loss_vlb_step=0.000256, Average Epoch time: 373.33 seconds
Average Peak memory 35567.64MiB
Epoch 1:  60%|▌| 301/505 [06:13<04:13,  1.24s/it, loss=0.245, v_num=0, train/loss_simple_step=0.0778, train/loss_vlb_step=0.000256, 
Saving latest checkpoint...

Traceback (most recent call last):
  File "main.py", line 835, in <module>
    trainer.test(model, data)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test
    return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl
    results = self._run(model, ckpt_path=self.tested_ckpt_path)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1128, in _run
    verify_loop_configurations(self)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 42, in verify_loop_configurations
    __verify_eval_loop_configuration(trainer, model, "test")
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 186, in __verify_eval_loop_configuration
    raise MisconfigurationException(f"No `{loader_name}()` method defined to run `Trainer.{trainer_method}`.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: No `test_dataloader()` method defined to run `Trainer.test`.

thomasf1 avatar Sep 09 '22 15:09 thomasf1

I haven't encountered this as I did not train that long. We do not have a test set, and we don't even have config for test dataset. Same thing for textual inversion, so maybe just remove anything that calls a test dataset (i.e., just remove the call of trainer.test)?

XavierXiao avatar Sep 09 '22 18:09 XavierXiao

It has something to do with the way the LDMs are written, I removed the trainer.test call from main.py and added a --no_test, but it didn't do it, I still get the same crash.

yeswecan avatar Sep 10 '22 00:09 yeswecan

Running it with --no-test true worked. So, that should probably be the default

thomasf1 avatar Sep 10 '22 00:09 thomasf1

For me, --no-test true worked too. It adds to the confustion that the tool doesn't complain on unrecognized parameters, like a misspelled no_test or if the parameter value true is missing.

raimohanska avatar Jan 05 '23 11:01 raimohanska