SpeechSplit
SpeechSplit copied to clipboard
The validation loss is rising and fluctuating, is that a regular situation?
Greetings, thanks for such a good project. In my experiment, i used the same dataset VCTK as yours, and i had only trained for 68000-steps. The log of my experiment like this:
I noticed that the validation loss is rising and the training loss also has some fluctuating peaks. Is it a normal phenomenon?
thank you in advance :)
The provided training data is very small for code verification purposes only.
The provided training data is very small for code verification purposes only.
Indeed. But the image above comes from experiment which using my own VCTK data. There were 20 speakers in my VCTK corpus, 80% of utterances were used in training steps and 10% for validation. I orgnized the data structure in the same form of provided .pkl file. Is there sth wrong happened in my experiment?
There might be something wrong with your validation data. The validation loss should be around 30.
There might be something wrong with your validation data. The validation loss should be around 30.
Thanks for your answer. I will check my preprocessing code. By the way, is it a right way for me to generate validation data in the same structure as that in your demo.pkl?
Like this:
[Speaker_Name , One-hot , [Mel, normed-F0, length, utterance_name] ]
This is how i did now.
Thanks again for your answer!
The format is correct.
Did you normalize the Mel spectrogram? What's the range of the Mel spec?
Hi, i am grateful if you can tell me how to restructure the demo.pkl. thx :) https://github.com/auspicious3000/SpeechSplit/blob/10ed8b9e25cce6c9a077e27ca175ba696b7df597/solver.py#L16
Can you make the right demo.pkl file?
@c1a1o1 Please clearly state your question and create a new issue. Please do NOT flood other issues.
Thanks for good paper and project. My experiment is similar to Buckingham's. The validation loss fluctuates around 100 after 30K iterations without further improvement. I haven't figured out what is wrong with my experiment. It would be great in case any suggestions. Thanks.
@jamesliu This looks like over-fitting to me. Make sure you use a large training set and the validation speakers are in the training set.
@auspicious3000 Yes. Thank you for pointing this out. After starting to use full P226 and P231 in VCTK corpus. The training and validation charts look reasonable now, but the reconstruction from the trained model is not good for the demo data. How many iterations can get good results for demo data? Do I need to run more iterations? How about using other optimizer instead of Adam? Thanks.
G training/validation charts
reconstruction from model
@jamesliu Your training set is actually very small, which has only 30 mins of data. Also, the "demo data" needs to be consistent with the training data.
@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks.
Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.
hparams
Default hyperparameters:
hparams = HParams( # synthesis builder = 'wavenet',
# model
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,
dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,
dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,
# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,
# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'
)
There are many contributing factors to output quality. It is hard to tell from the information you provided. @jamesliu
Could you guys check my problem?
I am trying to achieve very simple train and apply model
@jamesliu @Buckingham0805 @niu0717 @inconnu11 thank you very much for any help
https://github.com/auspicious3000/SpeechSplit/issues/28
Hi James,
I followed your updates and wonder if you continued with the experiments. I plan to run some experiments with another dataset and would like to learn from your experiments thus far.
Did the results become better? Did you increase the training size? Did you increase the training time? Did you change the hyperparameter? Did you try out other techniques?
Thanks and hope to hear from you!
@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.
hparams
Default hyperparameters:
hparams = HParams(
synthesis
builder = 'wavenet',
# model freq = 8, dim_neck = 8, freq_2 = 8, dim_neck_2 = 1, freq_3 = 8, dim_neck_3 = 32, dim_enc = 512, dim_enc_2 = 128, dim_enc_3 = 256, dim_freq = 80, dim_spk_emb = 82, dim_f0 = 257, dim_dec = 512, len_raw = 128, chs_grp = 16, # interp min_len_seg = 19, max_len_seg = 32, min_len_seq = 64, max_len_seq = 128, max_len_pad = 192, # data loader root_dir = '/data/music/speech_split/assets/spmel', feat_dir = '/data/music/speech_split/assets/raptf0', batch_size = 128, mode = 'train', shuffle = True, num_workers = 10, samplier = 8, #optimizer = 'RangerLars' optimizer = 'Adam'
)
Hi, I noticed that the length of most data in VCTK is too long relative to the required input. Have you processed the validation set, like MyCollator in data_loader?
I have the same question as @3139725181. How can we use longer audio files in the validation set?
You can use longer audio. There is no limit on the length of input.
During the validation step of training, I get an error from pad_seq_to_2() because len_out=192 is smaller than x.shape[1]
Right. All these lengths are hyperparameters that can be freely adjusted based your own requirements.
@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.
hparams
Default hyperparameters:
hparams = HParams( # synthesis builder = 'wavenet',
# model freq = 8, dim_neck = 8, freq_2 = 8, dim_neck_2 = 1, freq_3 = 8, dim_neck_3 = 32, dim_enc = 512, dim_enc_2 = 128, dim_enc_3 = 256, dim_freq = 80, dim_spk_emb = 82, dim_f0 = 257, dim_dec = 512, len_raw = 128, chs_grp = 16, # interp min_len_seg = 19, max_len_seg = 32, min_len_seq = 64, max_len_seq = 128, max_len_pad = 192, # data loader root_dir = '/data/music/speech_split/assets/spmel', feat_dir = '/data/music/speech_split/assets/raptf0', batch_size = 128, mode = 'train', shuffle = True, num_workers = 10, samplier = 8, #optimizer = 'RangerLars' optimizer = 'Adam'
)
Hi, I use the whole VCTK datset to trian, but the validation loss fluctuates around 70 and then rise. I wonder how you generate the validation data, the same as the training data?
@jixinya Yes, the validation data is just a separate partition of the training data.
@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.
hparams
Default hyperparameters:
hparams = HParams( # synthesis builder = 'wavenet',
# model freq = 8, dim_neck = 8, freq_2 = 8, dim_neck_2 = 1, freq_3 = 8, dim_neck_3 = 32, dim_enc = 512, dim_enc_2 = 128, dim_enc_3 = 256, dim_freq = 80, dim_spk_emb = 82, dim_f0 = 257, dim_dec = 512, len_raw = 128, chs_grp = 16, # interp min_len_seg = 19, max_len_seg = 32, min_len_seq = 64, max_len_seq = 128, max_len_pad = 192, # data loader root_dir = '/data/music/speech_split/assets/spmel', feat_dir = '/data/music/speech_split/assets/raptf0', batch_size = 128, mode = 'train', shuffle = True, num_workers = 10, samplier = 8, #optimizer = 'RangerLars' optimizer = 'Adam'
)
I also used all the voices from p225-p246, but my validation set loss function was oscillating upwards. And I use the trained model for conversion and it works poorly, and strangely the content of the sentences has changed, do you know what is causing this?