openspeech icon indicating copy to clipboard operation
openspeech copied to clipboard

How to set configs in hydra_train.py

Open JinmingChe opened this issue 3 years ago • 4 comments

❓ Questions & Help

Hello, I am learning how to use openspeech. And I want to set configs in python file, so I can debug easily. The recommended method is to pass parameters on the command.

Details

I try to use configs.dataset = 'librispeech' in hydra_train.py instead of python .hydra_train.py dataset=librispeech. But it gives me the following errors. omegaconf.errors.ConfigAttributeError: Key 'dataset' is not in struct full_key: dataset object_type=dic It is so kind of you to give me some advice about this usage.

JinmingChe avatar Nov 28 '21 15:11 JinmingChe

I think the problem is that configs.dataset should be a dictionary and not a string. If you wanna change it on the python script, I believe you should configs.dataset.dataset="librispeech"

f you wanna do it from the command line you can do:

python hydra_train.py dataset.dataset="librispeech"

OleguerCanal avatar Nov 29 '21 09:11 OleguerCanal

Can you show us how you made the command? It could be a command grammar error.

sooftware avatar Nov 30 '21 20:11 sooftware

The followling is my command @hydra.main(config_path=os.path.join("..", "openspeech", "configs"), config_name="train") def hydra_main(configs: DictConfig) -> None: rank_zero_info(OmegaConf.to_yaml(configs)) pl.seed_everything(configs.trainer.seed) # way 1 configs['dataset'] = 'librispeech' # way 2 configs.dataset = 'librispeech'

I use two ways to add configs.dataset. But they all give me the same error. Exception has occurred: ConfigKeyError Key 'dataset' is not in struct full_key: dataset object_type=dict The above exception was the direct cause of the following exception: File "/home/chenjinming/github/openspeech/openspeech_cli/hydra_train.py", line 44, in hydra_main configs['dataset'] = 'librispeech'

And I print the configs struct. {'augment': {'apply_spec_augment': False, 'apply_noise_augment': False, 'apply_joining_augment': False, 'apply_time_stretch_augment': False, 'freq_mask_para': 27, 'freq_mask_num': 2, 'time_mask_num': 4, 'noise_dataset_dir': 'None', 'noise_level': 0.7, 'time_stretch_min_rate': 0.7, 'time_stretch_max_rate': 1.4}, 'trainer': {'seed': 1, 'accelerator': 'dp', 'accumulate_grad_batches': 1, 'num_workers': 4, 'batch_size': 32, 'check_val_every_n_epoch': 1, 'gradient_clip_val': 5.0, 'logger': 'wandb', 'max_epochs': 20, 'save_checkpoint_n_steps': 10000, 'auto_scale_batch_size': 'binsearch', 'sampler': 'smart', 'name': 'gpu', 'device': 'gpu', 'use_cuda': True, 'auto_select_gpus': True}}

It seems that has no key of 'dataset'. My propose is to add a new key or change the default configs setting instead of using the command line.

JinmingChe avatar Dec 01 '21 02:12 JinmingChe

Did you try configs['dataset']['dataset'] = 'librispeech'?

configs['dataset'] has to be dictionary, which has as keys 'dataset', 'dataset_path', 'dataset_download', and 'manifest_file_path'.

It throws an error when you try configs['dataset'] = 'librispeech', which is string but not dict. Therefore it is removed from the configs and you don't see it when you print it.

resurgo97 avatar Dec 04 '21 13:12 resurgo97