vits when training this model by the our custom datasets we are facing the below error constantly "Dimension out of range (expected to be in range of [-1, 0], but got 1)"

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) we have converted our dataset as LJspeech dataset . our custom dataset is on hugging face named "procit008/small" .

[rank0]:[W404 15:57:59.072467923 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) Traceback (most recent call last): File "/home/procit/procit/vits/train.py", line 290, in main() File "/home/procit/procit/vits/train.py", line 50, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 340, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method="spawn") File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 296, in start_processes while not context.join(): File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 215, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap fn(i, *args) File "/home/procit/procit/vits/train.py", line 117, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/home/procit/procit/vits/train.py", line 137, in train_and_evaluate for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(train_loader): File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in next data = self._next_data() File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1480, in _next_data return self._process_data(data) File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1505, in _process_data data.reraise() File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/_utils.py", line 733, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File "/home/procit/procit/vits/env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch return self.collate_fn(data) File "/home/procit/procit/vits/data_utils.py", line 119, in call max_spec_len = max([x[1].size(1) for x in batch]) File "/home/procit/procit/vits/data_utils.py", line 119, in max_spec_len = max([x[1].size(1) for x in batch]) IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Apr 04 '25 10:04 Rajan-Mahato

I think this error is due to your audio files, this model requires the audio to be in mono and not stereo, that's why the error in the dimensions.

Apr 16 '25 10:04 Pipe1213

I don't think that's the case. I'm also getting this same error using LJSpeech (which is mono) and the default recipe.


andre@1080:~/projects/vits$ soxi DUMMY1/LJ018-0137.wav

Input File     : 'DUMMY1/LJ018-0137.wav'
Channels       : 1
Sample Rate    : 22050
Precision      : 16-bit
Duration       : 00:00:06.97 = 153757 samples ~ 522.983 CDDA sectors
File Size      : 308k
Bit Rate       : 353k
Sample Encoding: 16-bit Signed Integer PCM

May 16 '25 04:05 andrenatal

I also encountered this error while using LJSpeech. Do you know how to solve it?

Jun 10 '25 11:06 LIDMXI

I fixed but now can't remember how :/

Jun 10 '25 17:06 andrenatal

Is the original ljs data reasonable? Is it necessary to print each one to check if the data conforms to the dimensions?

Jun 17 '25 11:06 shaoqi2333

I fixed but now can't remember how :/

hello , did you make vits model with your voice ?

Jun 24 '25 21:06 surxjj

Hello, may I ask if you have fixed it and can use the LJSpeech dataset for training? How did you fix it? Did you modify the version of some installation packages or modify some code? If some code has been modified but you don't remember, can I refer to your code?

I fixed but now can't remember how :/

Jun 25 '25 03:06 LIDMXI

Yes, I managed to train a model using my own datasets but can't remember what I needed to do to fix this. I'll look at it in a couple of days again and can maybe find

Jun 25 '25 04:06 andrenatal

Thank you. If you find it, please let me know how to solve it. Thank you very much for your help.

Yes, I managed to train a model using my own datasets but can't remember what I needed to do to fix this. I'll look at it in a couple of days again and can maybe find

Jun 25 '25 04:06 LIDMXI

I solved this problem mainly because the "def spectram_torch" function in the "melsprocessing. py" file called the "torch. sft()" function, but now I need to add an additional parameter "return_complex=False" to this function. Without this parameter (which was not added in the original code), an error will occur and you will be asked to specify it. If you specify it as true, the error you mentioned will occur. If you specify it as false, it will work normally. The other places in this file that call the 'torch. sft()' function have also been modified accordingly

Jul 16 '25 05:07 LIDMXI

Correct. My changes are in this PR https://github.com/jaywalnut310/vits/pull/229

Jul 21 '25 23:07 andrenatal