SpecVQGAN How to replace the new dataset

Hi @v-iashin If I want to retrain the model with a new dataset, such as LJSpeech, which .py file should I start with?

Mar 05 '24 09:03 a897456

Hi, thank you for your question.

Assuming you are only considering training the first stage (autoencoder), here are a few hints:

Create a dataset module similar to VGGSound in specvqgan/data/vggsound.py.
Adapt the config configs/vggsound_codebook.yaml for your dataset.
Then follow the instructions Training a spectrogram codebook

Mar 05 '24 09:03 v-iashin

Hi @v-iashin Thank you for your reply. I obtained the mel_spectrum from the LJSpeech dataset through tacotron2 and generated 13100 .npy files, but the spec_len of the mel_spectrum in these files was disorganized, and the .npy file whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files. What should I do to keep going? Please give me some hints. Thank you.

Mar 06 '24 08:03 a897456

E:\ProgramData\anaconda3\envs\py39\python.exe C:\Users\User1\Downloads\SpecVQGAN-main\train.py python train.py --base configs/LJSpeech_codebook.yaml -t True --gpus 0, 2024-03-06T20-39-39_LJSpeech_codebook Global seed set to 23 Running on GPUs 0, loaded pretrained LPAPS loss from specvqgan/modules/autoencoder/lpaps\vggishish16.pt VQLPAPSWithDiscriminator running with hinge loss. E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: UserWarning: ModelCheckpoint(save_last=True, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None). warnings.warn(*args, **kwargs) We will not save audio for conditioning and conditioning_rec GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Global seed set to 23 initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1 [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - ��У��ĵ�ַ��Ч��). [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - ��У��ĵ�ַ��Ч��). accumulate_grad_batches = 1 Setting learning rate to 1.35e-05 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 3 (batchsize) * 4.50e-06 (base_lr) Project config model: base_learning_rate: 4.5e-06 target: specvqgan.models.vqgan.VQModel params: embed_dim: 256 n_embed: 1024 ddconfig: double_z: false z_channels: 256 resolution: 848 in_channels: 1 out_ch: 1 ch: 128 ch_mult: - 1 - 1 - 2 - 2 - 4 num_res_blocks: 2 attn_resolutions: - 53 dropout: 0.0 lossconfig: target: specvqgan.modules.losses.vqperceptual.VQLPAPSWithDiscriminator params: disc_conditional: false disc_in_channels: 1 disc_start: 30001 disc_weight: 0.8 codebook_weight: 1.0 min_adapt_weight: 1.0 max_adapt_weight: 1.0 perceptual_weight: 1.0 data: target: train.SpectrogramDataModuleFromConfig params: batch_size: 3 num_workers: 8 spec_dir_path: ./data/LJSpeech sample_rate: 22050 mel_num: 80 spec_len: 860 spec_crop_len: 848 random_crop: false train: target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTrain params: specs_dataset_cfg: null validation: target: specvqgan.data.LJSpeech_data.LJSpeechSpecsValidation params: specs_dataset_cfg: null test: target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTest params: specs_dataset_cfg: null python: null train: py: null

Lightning config callbacks: image_logger: target: train.ImageLogger params: for_specs: true vocoder_cfg: target: train.VocoderMelGan params: ckpt_vocoder: ./vocoder/logs/vggsound/ trainer: sync_batchnorm: true distributed_backend: ddp gpus: 0,

E:\ProgramData\anaconda3\envs\py39\lib\site-packages\omegaconf\basecontainer.py:225: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)

warnings.warn(

| Name | Type | Params

0 | encoder | Encoder | 29.3 M 1 | decoder | Decoder | 42.4 M 2 | loss | VQLPAPSWithDiscriminator | 17.5 M 3 | quantize | VectorQuantizer | 262 K 4 | quant_conv | Conv2d | 65.8 K 5 | post_quant_conv | Conv2d | 65.8 K

74.9 M Trainable params 14.7 M Non-trainable params 89.6 M Total params 358.463 Total estimated model params size (MB) Epoch 0: 0%| | 0/151 [00:00<?, ?it/s] E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: RuntimeWarning: You are using LearningRateMonitor callback with models that have no learning rate schedulers. Please see documentation for configure_optimizers method. warnings.warn(*args, **kwargs)

Hi @v-iashin Did I succeed?

Mar 06 '24 12:03 a897456

Hi @v-iashin I am so sorry, I'm a novice and I may ask a lot of silly questions, please forgive me. thank you. Can you tell me what do these numbers mean?

Mar 06 '24 12:03 a897456

these are the number of times each codebook code was used during previous epoch, e.g. code No.4 was used 48 times. You want these counts to be as uniform as possible (less zeros).

Mar 06 '24 13:03 v-iashin

Hi @v-iashin Thank you for your reply

validating shows 0%, whether this is because I did not add a validation set?
Now epoch=12, there seem to be more and more zeros. What can I do to reduce them?
What is the number of the loss and the epoch in your pre-trained model?

Mar 06 '24 13:03 a897456

Hi @v-iashin Thank you for your reply. I obtained the mel_spectrum from the LJSpeech dataset through tacotron2 and generated 13100 .npy files, but the spec_len of the mel_spectrum in these files was disorganized, and the .npy file whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files. What should I do to keep going? Please give me some hints. Thank you.

Hi @v-iashin Can you give me some guidance on this problem?

Mar 08 '24 06:03 a897456

SpecVQGAN SpecVQGAN copied to clipboard

How to replace the new dataset

| Name | Type | Params

0 | encoder | Encoder | 29.3 M 1 | decoder | Decoder | 42.4 M 2 | loss | VQLPAPSWithDiscriminator | 17.5 M 3 | quantize | VectorQuantizer | 262 K 4 | quant_conv | Conv2d | 65.8 K 5 | post_quant_conv | Conv2d | 65.8 K

SpecVQGAN
SpecVQGAN copied to clipboard