taming-transformers
taming-transformers copied to clipboard
Training custom_vqgan: ConfigAttributeError: Missing key logger
Hi! I'm receiving an error while trying to train the model on a custom dataset in colab. I created test.txt and training.txt and modified the paths at custom_vqgan.yaml. I don't know if the images dimension should be fixed to something specific, the ones I'm using are jpg rgb images with size 432x288
I run this: !python main.py --base configs/custom_vqgan.yaml -t True --gpus 0,
And I get this:
Global seed set to 23
Running on GPUs 0,
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100% 528M/528M [00:19<00:00, 28.7MB/s]
Downloading vgg_lpips model from https://heibox.uni-heidelberg.de/f/607503859c864bc1b30b/?dl=1 to taming/modules/autoencoder/lpips/vgg.pth
8.19kB [00:00, 435kB/s]
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
File "main.py", line 462, in <module>
logger_cfg = lightning_config.logger or OmegaConf.create()
File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 354, in __getattr__
key=key, value=None, cause=e, type_override=ConfigAttributeError
File "/usr/local/lib/python3.7/dist-packages/omegaconf/base.py", line 196, in _format_and_raise
type_override=type_override,
File "/usr/local/lib/python3.7/dist-packages/omegaconf/_utils.py", line 821, in format_and_raise
_raise(ex, cause)
File "/usr/local/lib/python3.7/dist-packages/omegaconf/_utils.py", line 719, in _raise
raise ex.with_traceback(sys.exc_info()[2]) # set end OC_CAUSE=1 for full backtrace
File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 351, in __getattr__
return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)
File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 438, in _get_impl
node = self._get_node(key=key, throw_on_missing_key=True)
File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 470, in _get_node
raise ConfigKeyError(f"Missing key {key}")
omegaconf.errors.ConfigAttributeError: Missing key logger
full_key: logger
object_type=dict
I ran into this as well, and determined it was something to do with the fact I didn't install the versions of PyTorch, Lightning, OmegaConf, etc from the environment.yaml. It's looking for a lightning_config
section of your config file and I guess older versions were able to use default values when it's not there:
https://github.com/CompVis/taming-transformers/blob/master/main.py#L346-L357
This claims it's "optional", but it appears to be required with updated packages. I was able to get past this immediate issue when I followed the directions to install dependencies from environment.yaml. (however, I ran into other issues I don't remember and gave up... not sure if you would have the same problem).
EDIT: Another option, is to modify the 3-4 lines that look like this:
logger_cfg = lightning_config.logger or OmegaConf.create()
To simply:
logger_cfg = OmegaConf.create()
Btw, I revisited this and my EDIT above "worked" with latest packages when I also commented out the "print(config.pretty)" lines in main.py (they're just print/debug statements):
- https://github.com/CompVis/taming-transformers/blob/master/main.py#L194
- https://github.com/CompVis/taming-transformers/blob/master/main.py#L199
I put "worked" in quotes because I haven't been able to get it to try to use less than 100 GB of video memory (and thus crashes immediately), but at least it gets to the point that it's trying.
I'm sure it'd be better and give me better control to simply include a lightning
section in my config file, but I haven't found any good examples of it.
Hey thank you very much for the answer. Did you try those solutions on colab? I'm trying it but didn't work. About the 100 GB of video memory I thought it would work with a smaller graphic card based on these comments. Maybe reducing the batch size?
I haven't tried this on Colab. I'm using a RTX 3090 with 24 GB of video RAM. My original problem was that the environment.yaml installed a version of PyTorch not compatible with the sm_86 architecture (3090), but then got these missing key errors when I upgraded.
Ignore my comment about 100GB. I misunderstood one of the config options and it was trying to build the network to process images with shape (batch
, 3, 18000, 18000). That's why it was trying to use 100 GB+. With that straightened out it's training now!
For reference, it's now processing batches of size (256, 3, 64, 64) and the process is consuming about 18.6 GB of video RAM. It's using the following (abridged) config:
# Abridged config for reference
model:
target: taming.models.vqgan.VQModel
params:
embed_dim: 128
n_embed: 96
ddconfig:
z_channels: 64
resolution: 64
ch: 32
ch_mult: [1,1,2,2,4]
num_res_blocks: 2
attn_resolutions: [16]
...
data:
target: main.DataModuleFromConfig
params:
batch_size: 256
num_workers: 16
train:
target: taming.data.custom.CustomTrain
params:
training_images_list_file: train.txt
size: 64
validation:
...
I ran into this as well, and determined it was something to do with the fact I didn't install the versions of PyTorch, Lightning, OmegaConf, etc from the environment.yaml. It's looking for a
lightning_config
section of your config file and I guess older versions were able to use default values when it's not there:https://github.com/CompVis/taming-transformers/blob/master/main.py#L346-L357
This claims it's "optional", but it appears to be required with updated packages. I was able to get past this immediate issue when I followed the directions to install dependencies from environment.yaml. (however, I ran into other issues I don't remember and gave up... not sure if you would have the same problem).
EDIT: Another option, is to modify the 3-4 lines that look like this:
logger_cfg = lightning_config.logger or OmegaConf.create()
To simply:
logger_cfg = OmegaConf.create()
Hi, Thanks for your solution! After changing the logger_cfg, I am getting this error:
Traceback (most recent call last):
File "taming-transformers/main.py", line 520, in callbacks
argument in the Trainer constructor instead.
Has anyone solve this?
Just check your lib's version is right. - pytorch-lightning==1.0.8 and - omegaconf==2.0.0.
Just check your lib's version is right. - pytorch-lightning==1.0.8 and - omegaconf==2.0.0.
Unfortunately not that simple. That version of pytorch (and/or lightning) is not compatible with Ampere GPUs (in my case, NVIDIA RTX 3090). I had no choice but to upgrade everything and attempt to fix the errors.
self._configure_checkpoint_callbacks(checkpoint_callback) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 77, in _configure_checkpoint_callbacks raise MisconfigurationException(error_msg) pytorch_lightning.utilities.exceptions.MisconfigurationException: Invalid type provided for checkpoint_callback: Expected bool but received <class 'pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint'>. Pass callback instances to the
callbacks
argument in the Trainer constructor instead.Has anyone solve this?
It's been a while now since I got this working (and I can't remember how well it was working), but I did get something running eventually. For the checkpoint thing, apparently that lightning/trainer argument is now a boolean. The other big change is at the end of this diff, where a particular argument disappeared because it was made the default behavior of the function.
diff --git a/main.py b/main.py
index 7b4f94c..e595a5c 100644
--- a/main.py
+++ b/main.py
@@ -191,12 +191,12 @@ class SetupCallback(Callback):
os.makedirs(self.cfgdir, exist_ok=True)
print("Project config")
- print(self.config.pretty())
+ #print(self.config.pretty())
OmegaConf.save(self.config,
os.path.join(self.cfgdir, "{}-project.yaml".format(self.now)))
print("Lightning config")
- print(self.lightning_config.pretty())
+ #print(self.lightning_config.pretty())
OmegaConf.save(OmegaConf.create({"lightning": self.lightning_config}),
os.path.join(self.cfgdir, "{}-lightning.yaml".format(self.now)))
@@ -459,19 +463,21 @@ if __name__ == "__main__":
},
}
default_logger_cfg = default_logger_cfgs["testtube"]
- logger_cfg = lightning_config.logger or OmegaConf.create()
+ logger_cfg = OmegaConf.create()
logger_cfg = OmegaConf.merge(default_logger_cfg, logger_cfg)
trainer_kwargs["logger"] = instantiate_from_config(logger_cfg)
# modelcheckpoint - use TrainResult/EvalResult(checkpoint_on=metric) to
# specify which metric is used to determine best models
default_modelckpt_cfg = {
- "target": "pytorch_lightning.callbacks.ModelCheckpoint",
+ "target": "pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint",
"params": {
"dirpath": ckptdir,
"filename": "{epoch:06}",
"verbose": True,
"save_last": True,
@@ -479,9 +485,9 @@ if __name__ == "__main__":
default_modelckpt_cfg["params"]["monitor"] = model.monitor
default_modelckpt_cfg["params"]["save_top_k"] = 3
- modelckpt_cfg = lightning_config.modelcheckpoint or OmegaConf.create()
+ modelckpt_cfg = OmegaConf.create()
modelckpt_cfg = OmegaConf.merge(default_modelckpt_cfg, modelckpt_cfg)
- trainer_kwargs["checkpoint_callback"] = instantiate_from_config(modelckpt_cfg)
+ trainer_kwargs["checkpoint_callback"] = True
# add callback which sets up log directory
default_callbacks_cfg = {
@@ -512,8 +518,9 @@ if __name__ == "__main__":
#"log_momentum": True
}
},
+ "checkpointer": modelckpt_cfg,
}
- callbacks_cfg = lightning_config.callbacks or OmegaConf.create()
+ callbacks_cfg = OmegaConf.create()
callbacks_cfg = OmegaConf.merge(default_callbacks_cfg, callbacks_cfg)
trainer_kwargs["callbacks"] = [instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg]
@@ -533,7 +540,7 @@ if __name__ == "__main__":
ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
else:
ngpu = 1
- accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches or 1
+ accumulate_grad_batches = 1
print(f"accumulate_grad_batches = {accumulate_grad_batches}")
lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr
diff --git a/taming/models/vqgan.py b/taming/models/vqgan.py
index 121d01f..1858fd4 100644
--- a/taming/models/vqgan.py
+++ b/taming/models/vqgan.py
@@ -333,7 +333,8 @@ class GumbelVQ(VQModel):
def validation_step(self, batch, batch_idx):
x = self.get_input(batch, self.image_key)
- xrec, qloss = self(x, return_pred_indices=True)
+ xrec, qloss = self(x)
aeloss, log_dict_ae = self.loss(qloss, x, xrec, 0, self.global_step,
last_layer=self.get_last_layer(), split="val")
Got this issue and reinstalling pytorch-lightning==1.0.8 and - omegaconf==2.0.0 fixed the problem. But the versions are different from that in requirements.txt.
thank you, pip install pytorch-lightning==1.0.8 omegaconf==2.0.0
helped
for some reason conda env create -f environment.yaml
didn't install some required libs
Modify thelines that look like this if you omegaconf==2.3.0 (about) from:
logger_cfg = lightning_config.logger or OmegaConf.create()
to this:
if "logger" in lightning_config:
logger_cfg = lightning_config.logger
else:
logger_cfg = OmegaConf.create()
Hi I need to use omegaconf>=2.0.6, <=2.1 for other dependencies and I have this error when using omegaconf== 2.1 !
raise ConfigKeyError(f"Missing key {key}") omegaconf.errors.ConfigAttributeError: Missing key config full_key: config object_type=dict Has anyone fixed the error?
thanks
Rakib
thank you,
pip install pytorch-lightning==1.0.8 omegaconf==2.0.0
helped for some reasonconda env create -f environment.yaml
didn't install some required libs
This was helpful, and resolved, simple and nice solution