Greetings, thanks for such a good project. In my experiment, i used the same dataset VCTK as yours, and i had only trained for 68000-steps. The log of my experiment like this:

311600500287_ pic

I noticed that the validation loss is rising and the training loss also has some fluctuating peaks. Is it a normal phenomenon？

thank you in advance :)

Sep 19 '20 07:09 XintaoZhao0805

The provided training data is very small for code verification purposes only.

Sep 20 '20 13:09 auspicious3000

The provided training data is very small for code verification purposes only.

Indeed. But the image above comes from experiment which using my own VCTK data. There were 20 speakers in my VCTK corpus, 80% of utterances were used in training steps and 10% for validation. I orgnized the data structure in the same form of provided .pkl file. Is there sth wrong happened in my experiment?

Sep 20 '20 14:09 XintaoZhao0805

There might be something wrong with your validation data. The validation loss should be around 30.

Sep 20 '20 15:09 auspicious3000

There might be something wrong with your validation data. The validation loss should be around 30.

Thanks for your answer. I will check my preprocessing code. By the way, is it a right way for me to generate validation data in the same structure as that in your demo.pkl? Like this:

[Speaker_Name , One-hot , [Mel, normed-F0, length, utterance_name] ]

This is how i did now.

Thanks again for your answer!

Sep 21 '20 00:09 XintaoZhao0805

The format is correct.

Sep 21 '20 14:09 auspicious3000

Did you normalize the Mel spectrogram? What's the range of the Mel spec?

Sep 21 '20 14:09 inconnu11

Hi, i am grateful if you can tell me how to restructure the demo.pkl. thx :) https://github.com/auspicious3000/SpeechSplit/blob/10ed8b9e25cce6c9a077e27ca175ba696b7df597/solver.py#L16

Oct 19 '20 06:10 niu0717

Can you make the right demo.pkl file?

Oct 21 '20 07:10 c1a1o1

@c1a1o1 Please clearly state your question and create a new issue. Please do NOT flood other issues.

Oct 21 '20 15:10 auspicious3000

Thanks for good paper and project. My experiment is similar to Buckingham's. The validation loss fluctuates around 100 after 30K iterations without further improvement. I haven't figured out what is wrong with my experiment. It would be great in case any suggestions. Thanks.

Nov 08 '20 06:11 jamesliu

@jamesliu This looks like over-fitting to me. Make sure you use a large training set and the validation speakers are in the training set.

Nov 09 '20 01:11 auspicious3000

@auspicious3000 Yes. Thank you for pointing this out. After starting to use full P226 and P231 in VCTK corpus. The training and validation charts look reasonable now, but the reconstruction from the trained model is not good for the demo data. How many iterations can get good results for demo data? Do I need to run more iterations? How about using other optimizer instead of Adam? Thanks.

G training/validation charts

Screen Shot 2020-11-11 at 7 39 07 PM

reconstruction from model

Screen Shot 2020-11-11 at 7 46 40 PM

Nov 12 '20 03:11 jamesliu

@jamesliu Your training set is actually very small, which has only 30 mins of data. Also, the "demo data" needs to be consistent with the training data.

Nov 12 '20 05:11 auspicious3000

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K. Screen Shot 2020-11-14 at 7 38 12 AM

hparams

Default hyperparameters:

hparams = HParams( # synthesis builder = 'wavenet',

# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'

)

Nov 14 '20 15:11 jamesliu

There are many contributing factors to output quality. It is hard to tell from the information you provided. @jamesliu

Nov 16 '20 19:11 auspicious3000

Could you guys check my problem?

I am trying to achieve very simple train and apply model

@jamesliu @Buckingham0805 @niu0717 @inconnu11 thank you very much for any help

https://github.com/auspicious3000/SpeechSplit/issues/28

Jan 17 '21 13:01 FurkanGozukara

Hi James,

I followed your updates and wonder if you continued with the experiments. I plan to run some experiments with another dataset and would like to learn from your experiments thus far.

Did the results become better? Did you increase the training size? Did you increase the training time? Did you change the hyperparameter? Did you try out other techniques?

Thanks and hope to hear from you!

Jan 23 '21 23:01 tejuafonja

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.

hparams

Default hyperparameters:

hparams = HParams(

synthesis

builder = 'wavenet',
# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'
)

Hi, I noticed that the length of most data in VCTK is too long relative to the required input. Have you processed the validation set, like MyCollator in data_loader?

Sep 27 '21 06:09 3139725181

I have the same question as @3139725181. How can we use longer audio files in the validation set?

Oct 21 '21 22:10 anon-squid

You can use longer audio. There is no limit on the length of input.

Oct 21 '21 22:10 auspicious3000

During the validation step of training, I get an error from pad_seq_to_2() because len_out=192 is smaller than x.shape[1]

Oct 22 '21 20:10 anon-squid

Right. All these lengths are hyperparameters that can be freely adjusted based your own requirements.

Oct 22 '21 20:10 auspicious3000

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.

hparams

Default hyperparameters:

hparams = HParams( # synthesis builder = 'wavenet',
# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'
)

Hi, I use the whole VCTK datset to trian, but the validation loss fluctuates around 70 and then rise. I wonder how you generate the validation data, the same as the training data?

Jul 13 '22 19:07 jixinya

@jixinya Yes, the validation data is just a separate partition of the training data.

Jul 14 '22 20:07 auspicious3000

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.

hparams

Default hyperparameters:

hparams = HParams( # synthesis builder = 'wavenet',
# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'
)

I also used all the voices from p225-p246, but my validation set loss function was oscillating upwards. And I use the trained model for conversion and it works poorly, and strangely the content of the sentences has changed, do you know what is causing this?

Mar 06 '23 01:03 9527950

SpeechSplit
SpeechSplit copied to clipboard

The validation loss is rising and fluctuating, is that a regular situation?

G training/validation charts

reconstruction from model

hparams

Default hyperparameters:

hparams

Default hyperparameters:

synthesis

hparams

Default hyperparameters:

hparams

Default hyperparameters:

SpeechSplit SpeechSplit copied to clipboard

The validation loss is rising and fluctuating, is that a regular situation?

G training/validation charts

reconstruction from model

hparams

Default hyperparameters:

hparams

Default hyperparameters:

synthesis

hparams

Default hyperparameters:

hparams

Default hyperparameters:

SpeechSplit
SpeechSplit copied to clipboard