SpecVQGAN
SpecVQGAN copied to clipboard
How to replace the new dataset
Hi @v-iashin
If I want to retrain the model with a new dataset, such as LJSpeech, which .py file should I start with?
Hi, thank you for your question.
Assuming you are only considering training the first stage (autoencoder), here are a few hints:
- Create a dataset module similar to VGGSound in
specvqgan/data/vggsound.py. - Adapt the config
configs/vggsound_codebook.yamlfor your dataset. - Then follow the instructions Training a spectrogram codebook
Hi @v-iashin
Thank you for your reply.
I obtained the mel_spectrum from the LJSpeech dataset through tacotron2 and generated 13100 .npy files, but the spec_len of the mel_spectrum in these files was disorganized, and the .npy file whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files.
What should I do to keep going? Please give me some hints. Thank you.
E:\ProgramData\anaconda3\envs\py39\python.exe C:\Users\User1\Downloads\SpecVQGAN-main\train.py python train.py --base configs/LJSpeech_codebook.yaml -t True --gpus 0, 2024-03-06T20-39-39_LJSpeech_codebook Global seed set to 23 Running on GPUs 0, loaded pretrained LPAPS loss from specvqgan/modules/autoencoder/lpaps\vggishish16.pt VQLPAPSWithDiscriminator running with hinge loss. E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: UserWarning: ModelCheckpoint(save_last=True, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None). warnings.warn(*args, **kwargs) We will not save audio for conditioning and conditioning_rec GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Global seed set to 23 initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1 [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - �����������У�������ĵ�ַ��Ч��). [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - �����������У�������ĵ�ַ��Ч��). accumulate_grad_batches = 1 Setting learning rate to 1.35e-05 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 3 (batchsize) * 4.50e-06 (base_lr) Project config model: base_learning_rate: 4.5e-06 target: specvqgan.models.vqgan.VQModel params: embed_dim: 256 n_embed: 1024 ddconfig: double_z: false z_channels: 256 resolution: 848 in_channels: 1 out_ch: 1 ch: 128 ch_mult: - 1 - 1 - 2 - 2 - 4 num_res_blocks: 2 attn_resolutions: - 53 dropout: 0.0 lossconfig: target: specvqgan.modules.losses.vqperceptual.VQLPAPSWithDiscriminator params: disc_conditional: false disc_in_channels: 1 disc_start: 30001 disc_weight: 0.8 codebook_weight: 1.0 min_adapt_weight: 1.0 max_adapt_weight: 1.0 perceptual_weight: 1.0 data: target: train.SpectrogramDataModuleFromConfig params: batch_size: 3 num_workers: 8 spec_dir_path: ./data/LJSpeech sample_rate: 22050 mel_num: 80 spec_len: 860 spec_crop_len: 848 random_crop: false train: target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTrain params: specs_dataset_cfg: null validation: target: specvqgan.data.LJSpeech_data.LJSpeechSpecsValidation params: specs_dataset_cfg: null test: target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTest params: specs_dataset_cfg: null python: null train: py: null
Lightning config callbacks: image_logger: target: train.ImageLogger params: for_specs: true vocoder_cfg: target: train.VocoderMelGan params: ckpt_vocoder: ./vocoder/logs/vggsound/ trainer: sync_batchnorm: true distributed_backend: ddp gpus: 0,
E:\ProgramData\anaconda3\envs\py39\lib\site-packages\omegaconf\basecontainer.py:225: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)
warnings.warn(
| Name | Type | Params
0 | encoder | Encoder | 29.3 M 1 | decoder | Decoder | 42.4 M 2 | loss | VQLPAPSWithDiscriminator | 17.5 M 3 | quantize | VectorQuantizer | 262 K 4 | quant_conv | Conv2d | 65.8 K 5 | post_quant_conv | Conv2d | 65.8 K
74.9 M Trainable params
14.7 M Non-trainable params
89.6 M Total params
358.463 Total estimated model params size (MB)
Epoch 0: 0%| | 0/151 [00:00<?, ?it/s] E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: RuntimeWarning: You are using LearningRateMonitor callback with models that have no learning rate schedulers. Please see documentation for configure_optimizers method.
warnings.warn(*args, **kwargs)
Hi @v-iashin Did I succeed?
Hi @v-iashin I am so sorry, I'm a novice and I may ask a lot of silly questions, please forgive me. thank you. Can you tell me what do these numbers mean?
these are the number of times each codebook code was used during previous epoch, e.g. code No.4 was used 48 times. You want these counts to be as uniform as possible (less zeros).
Hi @v-iashin Thank you for your reply
- validating shows 0%, whether this is because I did not add a validation set?
- Now epoch=12, there seem to be more and more zeros. What can I do to reduce them?
- What is the number of the loss and the epoch in your pre-trained model?
Hi @v-iashin Thank you for your reply. I obtained the mel_spectrum from the LJSpeech dataset through
tacotron2and generated 13100.npyfiles, but the spec_len of the mel_spectrum in these files was disorganized, and the.npyfile whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files. What should I do to keep going? Please give me some hints. Thank you.
Hi @v-iashin Can you give me some guidance on this problem?