[Bug]: Training Error -- Need At Least 1 Array to Concatenate Issue
Project Version
3.3.0
Platform and OS Version
W11
Affected Devices
RTX 5090 32GB
Existing Issues
https://github.com/IAHispano/Applio/issues/1020
What happened?
I am able to get to Step 4 AKA the Training but then when I click "Start Training" it will run like 5 seconds before stopping and this is what I have from the logs:
Starting pitch extraction on cuda:0 using rmvpe... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 394/394 [00:17<00:00, 22.01it/s] Pitch extraction completed in 21.53 seconds. Starting embedding extraction with 16 cores on cuda:0... Embedding extraction completed in 2.95 seconds. Not enough data present in the training set. Perhaps you forgot to slice the audio files in preprocess? An error occurred extracting the index: need at least one array to concatenate If you are running this code in a virtual environment, make sure you have enough GPU available to generate the Index file.
Other info:
I did look at https://github.com/IAHispano/Applio/issues/1020 which was in April of 2025, and I executed the pytorch commands using the person's /env/python. That still didn't help.
Steps to reproduce
- Go to Training tab, named the new model as usual, sampling rate at 48000 and CPU cores at 16 which is default. GPU is index 0 -- RTX 5090. This step seemingly runs without issue.
- Preprocess section -- Point dataset (auto selects already from dropdown) to correct location with about 400 1 - 8 second high quality 768kbps and 48khz wav files. These are all cleaned and formatted already. SKIP for audio cutting, I add noise filter, and "none" for normalization mode, and no noise reduction. Output says the model preprocessed successfully. Good so far.
- Extract -- default rmvpe and contentvec. I put 0 for silent training files but also tried with default "2" value. Successful still.
- Training -- Batch size, I tried 4 all the way up to 24, save every epoch and total epoch default values. I agree to terms of use and click "Start Training" -- I run into the above logs. Generate index before or after doesn't do anything different either.
Expected behavior
It should have started training without any errors.
Attachments
No response
Screenshots or Videos
Additional Information
No response
what did you select for embedding extraction?
what did you select for embedding extraction?
I selected contentvec
Edit: though, I don't specifically see a section for "embedding extraction" anywhere on the Training page.
btw, you did not need to run anything for 5090 support, that fix is included in v3.3.0
rename rvc\train\extract\extract.py to extract.old, then drop the attached file into the same folder. Restart Applio and try running extract features again. It should show an error message if there's any.
rename rvc\train\extract\extract.py to extract.old, then drop the attached file into the same folder. Restart Applio and try running extract features again. It should show an error message if there's any.
Thanks for the new file. I didn't see anything new in the logs:
`An error occurred connecting to Discord: Could not find Discord installed and running on this machine. Failed to launch on port 6969, trying again on port 6968... Failed to launch on port 6968, trying again on port 6967... Failed to launch on port 6967, trying again on port 6966...
- Running on local URL: http://127.0.0.1:6966
To create a public link, set share=True in launch().
Starting preprocess with 16 processes...
100%|██████████████████████████████████████████████████████████████████████████████████| 394/394 [00:08<00:00, 48.96it/s]
Preprocess completed in 8.05 seconds on 00:40:28 seconds of audio.
Starting pitch extraction on cuda:0 using rmvpe...
100%|█████████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 21320.38it/s]
Pitch extraction completed in 3.70 seconds.
Starting embedding extraction with 16 cores on cuda:0...
Embedding extraction completed in 2.97 seconds.
Not enough data present in the training set. Perhaps you forgot to slice the audio files in preprocess?
An error occurred extracting the index: need at least one array to concatenate
If you are running this code in a virtual environment, make sure you have enough GPU available to generate the Index file.`
Maybe I should try a manual git install of Applio? into a separate folder cleanly, and try again?
Maybe I should try a manual git install of Applio? into a separate folder cleanly, and try again?
Don't think that's the problem.
Normal extract feature should look like this:
Can you share your dataset?
I have shared an example file I am using. There are about 300-400 other files with the exact same properties, except for length. The length will vary between 1 to 8 seconds.
I have shared an example file I am using. There are about 300-400 other files with the exact same properties, except for length. The length will vary between 1 to 8 seconds.
If you want me to try and reproduce your issue, I need your whole set.
Generally I do not recommend using loose files as they come with different duration. Ideally you want a uniformly sliced dataset with some overlaps between segments.
Ok interestingly enough, I did try a clean download of that huge 5gb Applio manual install, and it seems to be working as of right now.
If you want me to try and reproduce your issue, I need your whole set. Generally I do not recommend using loose files as they come with different duration. Ideally you want a uniformly sliced dataset with some overlaps between segments.
Ah ok, is that what the preprocess slicing is for? I noticed the default is 3 seconds, and I see that longer wav files are cut up so sentences are cut off etc. Is that ok?
I forget to mention that the original was running off of Dione.
Ah ok, is that what the preprocess slicing is for? I noticed the default is 3 seconds, and I see that longer wav files are cut up so sentences are cut off etc. Is that ok?
preprocess slicing accepts large files (10+ minutes) and uses either a new simple method or old automatic slicing method. Simple method is very fast and it assumes you've prepared your dataset file and removed all excessive silences >0.25s. Automatic attempts to find gaps between words and use that as a slice points.
Simple slicing produces uniform 3s slices with 0.3s overlap. Automatic method may produce the same 3s slices and overlap if it finds no silences, or 1-5s segments if it does.
Using segments smaller and 3s or larger than 5s is not recommended.
Ok gotcha. I just went with the default simple slice. Gonna play around with my dataset and settings. Thanks for the help and useful info!
I did exactly the same as described in the problem on rented servers 3090/4090 on different versions of cuda, before everything worked fine, but now after installing this:
I did exactly the same as described in the problem on rented servers 3090/4090 on different versions of cuda, before everything worked fine, but now after installing this:
Did you install it manually? Or did you use a compiled version?
I did exactly the same as described in the problem on rented servers 3090/4090 on different versions of cuda, before everything worked fine, but now after installing this:
Did you install it manually? Or did you use a compiled version?
Manually, everything worked fine before, now I tried it on cuda 12.6 and this comes up.
Manually, everything worked fine before, now I tried it on cuda 12.6 and this comes up.
It is not clear. Are you using linux? windows? wsl on windows?
to test why f0 extraction fails, try running a simple inference using rmvpe. Something is breaking the extraction, could be bad rmvpe.pt file.
Раньше вручную все работало нормально, сейчас попробовал на cuda 12.6 и вот что вышло.
Непонятно. Вы используете Linux? Windows? WSL на Windows?
Чтобы проверить, почему извлечение f0 не удаётся, попробуйте выполнить простой вывод с помощью rmvpe. Что-то мешает извлечению, возможно, файл rmvpe.pt поврежден.
Linux desktop, the files may not be corrupted as I used to use them for training on version 3.2.9 before
i see two OS here
i see two OS here
I rent servers on 3090/4090 on linux my pc has a very long training time just
I rent servers on 3090/4090 on linux my pc has a very long training time just
As I said, run inference - that may help you diagnose why f0 extraction is failing
as for linux and cuda, I have no idea.. I can only suggest checking that cudnn libraries are installed properly
https://chatgpt.com/share/68a47e5f-5d7c-800a-8aa1-ca38a1ba0a93
Я арендую серверы на 3090/4090 на Linux, у моего ПК очень долгое время обучения.
Как я уже сказал, запустите вывод — это может помочь вам диагностировать причину сбоя извлечения f0.
Что касается Linux и CUDNN, то я понятия не имею. Могу только посоветовать проверить, правильно ли установлены библиотеки CUDNN.
https://chatgpt.com/share/68a47e5f-5d7c-800a-8aa1-ca38a1ba0a93
Ok, thanks
i see two OS here