WaveRNN icon indicating copy to clipboard operation
WaveRNN copied to clipboard

Some GTA files missing after running train_tacotron.py --force_gta

Open aguazul opened this issue 5 years ago • 5 comments

Thanks for this project! :D

I've been training tacotron for a few days now and its up to 192K steps.

I ran train_tacotron.py --force_gta and it completed.

However when I run train_wavernn.py --gta, it keeps saying that it can't find some of the files. Each time I run it it complains about a different missing file. I have confirmed that the filepath is correct and also confirmed that the files are actually missing. Is this caused by train_tacotron.py --force_gta not creating all the correct files? How do I get the missing files to be produced?

`(pyGPUenv) C:\Users\Brandon\Documents\WaveRNN-master\WaveRNN-master>python train_wavernn.py --gta Using device: cuda

Initialising Model...

Trainable Parameters: 4.234M Restoring from latest checkpoint... Loading latest weights: C:\Users\Documents\WaveRNN-master\WaveRNN-master\checkpoints\ljspeech_mol.wavernn\latest_weights.pyt Loading latest optimizer state: C:\Users\Documents\WaveRNN-master\WaveRNN-master\checkpoints\ljspeech_mol.wavernn\latest_optim.pyt +-------------+------------+--------+--------------+-----------+ | Remaining | Batch Size | LR | Sequence Len | GTA Train | +-------------+------------+--------+--------------+-----------+ | 1000k Steps | 32 | 0.0001 | 1375 | True | +-------------+------------+--------+--------------+-----------+

Traceback (most recent call last): File "train_wavernn.py", line 159, in main() File "train_wavernn.py", line 85, in main voc_train_loop(paths, voc_model, loss_func, optimizer, train_set, test_set, lr, total_steps) File "train_wavernn.py", line 105, in voc_train_loop for i, (x, y, m) in enumerate(train_set, 1): File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\torch\utils\data\dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\Documents\WaveRNN-master\WaveRNN-master\utils\dataset.py", line 27, in getitem m = np.load(self.mel_path/f'{item_id}.npy') File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\numpy\lib\npyio.py", line 415, in load fid = open(os_fspath(file), "rb") FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\\Documents\WaveRNN-master\WaveRNN-master\JBDataset\gta\HHW Lesson 7 Living a Values Driven Life_200.npy'

(pyGPUenv) C:\Users\Documents\WaveRNN-master\WaveRNN-master>`

I'm training on Windows 10 with pytorch 1.3 and cuda 10.

Thank you :)

aguazul avatar Dec 07 '19 18:12 aguazul

I figured it out. The get_tts_datasets function ignores any samples greater than tts_max_mel_len, which is set in the hparams.py file to be 1250. I increased this to 3348, which is one greater than the longest sample in my dataset. Now when I run --force_gta it includes all files, even the longer ones.

What is the benefit of excluding the longer samples? What effect does this have on training time if any? On the results, if any?

Thanks!

aguazul avatar Dec 08 '19 23:12 aguazul

Basic attention mechanisms are not very robust when training with long input/output sequences. This becomes especially problematic if one has long training phrases which may contain long pauses making the mapping between input sequence and output sequence harder for the network to figure out.

Latest Google Tacotron paper https://arxiv.org/abs/1910.10288 seems to offer solutions based on more sophisticated attention mechanisms.

On Sun, Dec 8, 2019 at 3:58 PM Brandon Bosse [email protected] wrote:

I figured it out. The get_tts_datasets function ignores any samples greater than tts_max_mel_len, which is set in the hparams.py file to be 1250. I increased this to 3348, which is one greater than the longest sample in my dataset. Now when I run --force_gta it includes all files, even the longer ones.

What is the benefit of excluding the longer samples? What effect does this have on training time if any? On the results, if any?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/163?email_source=notifications&email_token=ABMAQJYEYTFFMTO6RNR6VZTQXWCZBA5CNFSM4JXQVFS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGHOHZA#issuecomment-563012580, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAQJ4FM4ZNMDA4CCVMFR3QXWCZBANCNFSM4JXQVFSQ .

oytunturk avatar Dec 09 '19 04:12 oytunturk

If you'd rather not change that hyperparameter and just ignore the longer samples, this could be useful:

https://www.gitmemory.com/issue/fatchord/WaveRNN/72/492741940

gabriel-souza-omni avatar Dec 24 '19 12:12 gabriel-souza-omni

As @aguazul stated the problem is the lack of filtering of long samples files for the vocoder. I fixed it for my dataset by changing line 40 in get_vocoder_datasets:

dataset_ids = [x[0] for x in dataset]

to

dataset_ids = [x[0] for x in dataset if x[1] <= hp.tts_max_mel_len]

cschaefer26 avatar Jan 06 '20 08:01 cschaefer26

Besides that attention is not very robust for long-term sentences, the maximum number of Decoder RNN's time step is (max_mel_len // reduction_factor). Increasing the number of time steps in RNN leads to increase VRAM usage.

That is, if the input sentence is too long, your GPU memory may explodes because there is too many time steps in Decoder RNN. In this case, either you have to reduce the batch size or set the tts_max_mel_len to lower value.

mindmapper15 avatar Jan 10 '20 05:01 mindmapper15