taming-transformers icon indicating copy to clipboard operation
taming-transformers copied to clipboard

Training custom_vqgan: ConfigAttributeError: Missing key logger

Open mathigatti opened this issue 3 years ago • 15 comments

Hi! I'm receiving an error while trying to train the model on a custom dataset in colab. I created test.txt and training.txt and modified the paths at custom_vqgan.yaml. I don't know if the images dimension should be fixed to something specific, the ones I'm using are jpg rgb images with size 432x288

I run this: !python main.py --base configs/custom_vqgan.yaml -t True --gpus 0,

And I get this:

Global seed set to 23
Running on GPUs 0,
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100% 528M/528M [00:19<00:00, 28.7MB/s]
Downloading vgg_lpips model from https://heibox.uni-heidelberg.de/f/607503859c864bc1b30b/?dl=1 to taming/modules/autoencoder/lpips/vgg.pth
8.19kB [00:00, 435kB/s]        
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
  File "main.py", line 462, in <module>
    logger_cfg = lightning_config.logger or OmegaConf.create()
  File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 354, in __getattr__
    key=key, value=None, cause=e, type_override=ConfigAttributeError
  File "/usr/local/lib/python3.7/dist-packages/omegaconf/base.py", line 196, in _format_and_raise
    type_override=type_override,
  File "/usr/local/lib/python3.7/dist-packages/omegaconf/_utils.py", line 821, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.7/dist-packages/omegaconf/_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace
  File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 351, in __getattr__
    return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)
  File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 438, in _get_impl
    node = self._get_node(key=key, throw_on_missing_key=True)
  File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 470, in _get_node
    raise ConfigKeyError(f"Missing key {key}")
omegaconf.errors.ConfigAttributeError: Missing key logger
    full_key: logger
    object_type=dict

mathigatti avatar Jul 05 '21 20:07 mathigatti

I ran into this as well, and determined it was something to do with the fact I didn't install the versions of PyTorch, Lightning, OmegaConf, etc from the environment.yaml. It's looking for a lightning_config section of your config file and I guess older versions were able to use default values when it's not there:

https://github.com/CompVis/taming-transformers/blob/master/main.py#L346-L357

This claims it's "optional", but it appears to be required with updated packages. I was able to get past this immediate issue when I followed the directions to install dependencies from environment.yaml. (however, I ran into other issues I don't remember and gave up... not sure if you would have the same problem).

EDIT: Another option, is to modify the 3-4 lines that look like this:

logger_cfg = lightning_config.logger or OmegaConf.create()

To simply:

logger_cfg = OmegaConf.create()

areiner-novetta avatar Jul 07 '21 16:07 areiner-novetta

Btw, I revisited this and my EDIT above "worked" with latest packages when I also commented out the "print(config.pretty)" lines in main.py (they're just print/debug statements):

  • https://github.com/CompVis/taming-transformers/blob/master/main.py#L194
  • https://github.com/CompVis/taming-transformers/blob/master/main.py#L199

I put "worked" in quotes because I haven't been able to get it to try to use less than 100 GB of video memory (and thus crashes immediately), but at least it gets to the point that it's trying.

I'm sure it'd be better and give me better control to simply include a lightning section in my config file, but I haven't found any good examples of it.

etotheipi avatar Jul 09 '21 21:07 etotheipi

Hey thank you very much for the answer. Did you try those solutions on colab? I'm trying it but didn't work. About the 100 GB of video memory I thought it would work with a smaller graphic card based on these comments. Maybe reducing the batch size?

mathigatti avatar Jul 09 '21 23:07 mathigatti

I haven't tried this on Colab. I'm using a RTX 3090 with 24 GB of video RAM. My original problem was that the environment.yaml installed a version of PyTorch not compatible with the sm_86 architecture (3090), but then got these missing key errors when I upgraded.

Ignore my comment about 100GB. I misunderstood one of the config options and it was trying to build the network to process images with shape (batch, 3, 18000, 18000). That's why it was trying to use 100 GB+. With that straightened out it's training now!

For reference, it's now processing batches of size (256, 3, 64, 64) and the process is consuming about 18.6 GB of video RAM. It's using the following (abridged) config:

# Abridged config for reference
model:
  target: taming.models.vqgan.VQModel
  params:
    embed_dim: 128
    n_embed: 96
    ddconfig:
      z_channels: 64
      resolution: 64
      ch: 32
      ch_mult: [1,1,2,2,4]
      num_res_blocks: 2
      attn_resolutions: [16] 
      ...
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 256
    num_workers: 16
    train:
      target: taming.data.custom.CustomTrain
      params:
        training_images_list_file: train.txt
        size: 64
    validation:
       ...

areiner-novetta avatar Jul 10 '21 02:07 areiner-novetta

I ran into this as well, and determined it was something to do with the fact I didn't install the versions of PyTorch, Lightning, OmegaConf, etc from the environment.yaml. It's looking for a lightning_config section of your config file and I guess older versions were able to use default values when it's not there:

https://github.com/CompVis/taming-transformers/blob/master/main.py#L346-L357

This claims it's "optional", but it appears to be required with updated packages. I was able to get past this immediate issue when I followed the directions to install dependencies from environment.yaml. (however, I ran into other issues I don't remember and gave up... not sure if you would have the same problem).

EDIT: Another option, is to modify the 3-4 lines that look like this:

logger_cfg = lightning_config.logger or OmegaConf.create()

To simply:

logger_cfg = OmegaConf.create()

Hi, Thanks for your solution! After changing the logger_cfg, I am getting this error:

Traceback (most recent call last): File "taming-transformers/main.py", line 520, in trainer = Trainer.from_argparse_args(trainer_opt, **trainer_kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/properties.py", line 421, in from_argparse_args return from_argparse_args(cls, args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/argparse.py", line 52, in from_argparse_args return cls(**trainer_kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 40, in insert_env_defaults return fn(self, **kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 422, in init max_time, File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 52, in on_trainer_init self._configure_checkpoint_callbacks(checkpoint_callback) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 77, in _configure_checkpoint_callbacks raise MisconfigurationException(error_msg) pytorch_lightning.utilities.exceptions.MisconfigurationException: Invalid type provided for checkpoint_callback: Expected bool but received <class 'pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint'>. Pass callback instances to the callbacks argument in the Trainer constructor instead.

Has anyone solve this?

shanygepner avatar Aug 15 '21 09:08 shanygepner

Just check your lib's version is right. - pytorch-lightning==1.0.8 and - omegaconf==2.0.0.

hjq133 avatar Aug 18 '21 11:08 hjq133

Just check your lib's version is right. - pytorch-lightning==1.0.8 and - omegaconf==2.0.0.

Unfortunately not that simple. That version of pytorch (and/or lightning) is not compatible with Ampere GPUs (in my case, NVIDIA RTX 3090). I had no choice but to upgrade everything and attempt to fix the errors.

self._configure_checkpoint_callbacks(checkpoint_callback) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 77, in _configure_checkpoint_callbacks raise MisconfigurationException(error_msg) pytorch_lightning.utilities.exceptions.MisconfigurationException: Invalid type provided for checkpoint_callback: Expected bool but received <class 'pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint'>. Pass callback instances to the callbacks argument in the Trainer constructor instead.

Has anyone solve this?

It's been a while now since I got this working (and I can't remember how well it was working), but I did get something running eventually. For the checkpoint thing, apparently that lightning/trainer argument is now a boolean. The other big change is at the end of this diff, where a particular argument disappeared because it was made the default behavior of the function.

diff --git a/main.py b/main.py
index 7b4f94c..e595a5c 100644
--- a/main.py
+++ b/main.py
@@ -191,12 +191,12 @@ class SetupCallback(Callback):
             os.makedirs(self.cfgdir, exist_ok=True)
 
             print("Project config")
-            print(self.config.pretty())
+            #print(self.config.pretty())
             OmegaConf.save(self.config,
                            os.path.join(self.cfgdir, "{}-project.yaml".format(self.now)))
 
             print("Lightning config")
-            print(self.lightning_config.pretty())
+            #print(self.lightning_config.pretty())
             OmegaConf.save(OmegaConf.create({"lightning": self.lightning_config}),
                            os.path.join(self.cfgdir, "{}-lightning.yaml".format(self.now)))
@@ -459,19 +463,21 @@ if __name__ == "__main__":
             },
         }
         default_logger_cfg = default_logger_cfgs["testtube"]
-        logger_cfg = lightning_config.logger or OmegaConf.create()
+        logger_cfg = OmegaConf.create()
         logger_cfg = OmegaConf.merge(default_logger_cfg, logger_cfg)
         trainer_kwargs["logger"] = instantiate_from_config(logger_cfg)
 
         # modelcheckpoint - use TrainResult/EvalResult(checkpoint_on=metric) to
         # specify which metric is used to determine best models
         default_modelckpt_cfg = {
-            "target": "pytorch_lightning.callbacks.ModelCheckpoint",
+            "target": "pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint",
             "params": {
                 "dirpath": ckptdir,
                 "filename": "{epoch:06}",
                 "verbose": True,
                 "save_last": True,
@@ -479,9 +485,9 @@ if __name__ == "__main__":
             default_modelckpt_cfg["params"]["monitor"] = model.monitor
             default_modelckpt_cfg["params"]["save_top_k"] = 3
 
-        modelckpt_cfg = lightning_config.modelcheckpoint or OmegaConf.create()
+        modelckpt_cfg = OmegaConf.create()
         modelckpt_cfg = OmegaConf.merge(default_modelckpt_cfg, modelckpt_cfg)
-        trainer_kwargs["checkpoint_callback"] = instantiate_from_config(modelckpt_cfg)
+        trainer_kwargs["checkpoint_callback"] = True
 
         # add callback which sets up log directory
         default_callbacks_cfg = {
@@ -512,8 +518,9 @@ if __name__ == "__main__":
                     #"log_momentum": True
                 }
             },
+            "checkpointer": modelckpt_cfg,
         }
-        callbacks_cfg = lightning_config.callbacks or OmegaConf.create()
+        callbacks_cfg = OmegaConf.create()
         callbacks_cfg = OmegaConf.merge(default_callbacks_cfg, callbacks_cfg)
         trainer_kwargs["callbacks"] = [instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg]
 
@@ -533,7 +540,7 @@ if __name__ == "__main__":
             ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
         else:
             ngpu = 1
-        accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches or 1
+        accumulate_grad_batches = 1
         print(f"accumulate_grad_batches = {accumulate_grad_batches}")
         lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
         model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr
diff --git a/taming/models/vqgan.py b/taming/models/vqgan.py
index 121d01f..1858fd4 100644
--- a/taming/models/vqgan.py
+++ b/taming/models/vqgan.py
@@ -333,7 +333,8 @@ class GumbelVQ(VQModel):
 
     def validation_step(self, batch, batch_idx):
         x = self.get_input(batch, self.image_key)
-        xrec, qloss = self(x, return_pred_indices=True)
+        xrec, qloss = self(x)
         aeloss, log_dict_ae = self.loss(qloss, x, xrec, 0, self.global_step,
                                         last_layer=self.get_last_layer(), split="val")

areiner-novetta avatar Aug 18 '21 18:08 areiner-novetta

Got this issue and reinstalling pytorch-lightning==1.0.8 and - omegaconf==2.0.0 fixed the problem. But the versions are different from that in requirements.txt.

soon-yau avatar Nov 24 '21 13:11 soon-yau

thank you, pip install pytorch-lightning==1.0.8 omegaconf==2.0.0 helped for some reason conda env create -f environment.yaml didn't install some required libs

thepowerfuldeez avatar Jan 20 '22 06:01 thepowerfuldeez

Modify thelines that look like this if you omegaconf==2.3.0 (about) from:

logger_cfg = lightning_config.logger or OmegaConf.create()

to this:

if "logger" in lightning_config:
    logger_cfg = lightning_config.logger
else:
    logger_cfg = OmegaConf.create()

kuguazhiwang avatar Mar 06 '23 03:03 kuguazhiwang

Hi I need to use omegaconf>=2.0.6, <=2.1 for other dependencies and I have this error when using omegaconf== 2.1 !

raise ConfigKeyError(f"Missing key {key}") omegaconf.errors.ConfigAttributeError: Missing key config full_key: config object_type=dict Has anyone fixed the error?

thanks

Amnah1100 avatar May 17 '23 07:05 Amnah1100

Uploading 1688810097614.jpg…

rakibfdcs avatar Jul 10 '23 15:07 rakibfdcs

Rakib

rakibfdcs avatar Jul 10 '23 15:07 rakibfdcs

thank you, pip install pytorch-lightning==1.0.8 omegaconf==2.0.0 helped for some reason conda env create -f environment.yaml didn't install some required libs

This was helpful, and resolved, simple and nice solution

qm-intel avatar Oct 21 '23 17:10 qm-intel