Applio [Bug]: Training Error -- Need At Least 1 Array to Concatenate Issue

Project Version

3.3.0

Platform and OS Version

W11

Affected Devices

RTX 5090 32GB

Existing Issues

https://github.com/IAHispano/Applio/issues/1020

What happened?

I am able to get to Step 4 AKA the Training but then when I click "Start Training" it will run like 5 seconds before stopping and this is what I have from the logs:

Starting pitch extraction on cuda:0 using rmvpe... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 394/394 [00:17<00:00, 22.01it/s] Pitch extraction completed in 21.53 seconds. Starting embedding extraction with 16 cores on cuda:0... Embedding extraction completed in 2.95 seconds. Not enough data present in the training set. Perhaps you forgot to slice the audio files in preprocess? An error occurred extracting the index: need at least one array to concatenate If you are running this code in a virtual environment, make sure you have enough GPU available to generate the Index file.

Other info:

I did look at https://github.com/IAHispano/Applio/issues/1020 which was in April of 2025, and I executed the pytorch commands using the person's /env/python. That still didn't help.

Steps to reproduce

Go to Training tab, named the new model as usual, sampling rate at 48000 and CPU cores at 16 which is default. GPU is index 0 -- RTX 5090. This step seemingly runs without issue.
Preprocess section -- Point dataset (auto selects already from dropdown) to correct location with about 400 1 - 8 second high quality 768kbps and 48khz wav files. These are all cleaned and formatted already. SKIP for audio cutting, I add noise filter, and "none" for normalization mode, and no noise reduction. Output says the model preprocessed successfully. Good so far.
Extract -- default rmvpe and contentvec. I put 0 for silent training files but also tried with default "2" value. Successful still.
Training -- Batch size, I tried 4 all the way up to 24, save every epoch and total epoch default values. I agree to terms of use and click "Start Training" -- I run into the above logs. Generate index before or after doesn't do anything different either.

Expected behavior

It should have started training without any errors.

Attachments

No response

Screenshots or Videos

Additional Information

No response

Aug 19 '25 02:08 NitrogenSulfide

what did you select for embedding extraction?

Aug 19 '25 03:08 AznamirWoW

what did you select for embedding extraction?

I selected contentvec

Edit: though, I don't specifically see a section for "embedding extraction" anywhere on the Training page.

Aug 19 '25 03:08 NitrogenSulfide

btw, you did not need to run anything for 5090 support, that fix is included in v3.3.0

Aug 19 '25 03:08 AznamirWoW

rename rvc\train\extract\extract.py to extract.old, then drop the attached file into the same folder. Restart Applio and try running extract features again. It should show an error message if there's any.

extract.py

Aug 19 '25 03:08 AznamirWoW

rename rvc\train\extract\extract.py to extract.old, then drop the attached file into the same folder. Restart Applio and try running extract features again. It should show an error message if there's any.

extract.py

Thanks for the new file. I didn't see anything new in the logs:

`An error occurred connecting to Discord: Could not find Discord installed and running on this machine. Failed to launch on port 6969, trying again on port 6968... Failed to launch on port 6968, trying again on port 6967... Failed to launch on port 6967, trying again on port 6966...

Running on local URL: http://127.0.0.1:6966

To create a public link, set share=True in launch(). Starting preprocess with 16 processes... 100%|██████████████████████████████████████████████████████████████████████████████████| 394/394 [00:08<00:00, 48.96it/s] Preprocess completed in 8.05 seconds on 00:40:28 seconds of audio. Starting pitch extraction on cuda:0 using rmvpe... 100%|█████████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 21320.38it/s] Pitch extraction completed in 3.70 seconds. Starting embedding extraction with 16 cores on cuda:0... Embedding extraction completed in 2.97 seconds. Not enough data present in the training set. Perhaps you forgot to slice the audio files in preprocess? An error occurred extracting the index: need at least one array to concatenate If you are running this code in a virtual environment, make sure you have enough GPU available to generate the Index file.`

Maybe I should try a manual git install of Applio? into a separate folder cleanly, and try again?

Aug 19 '25 03:08 NitrogenSulfide

Maybe I should try a manual git install of Applio? into a separate folder cleanly, and try again?

Don't think that's the problem.

Normal extract feature should look like this:

Can you share your dataset?

Aug 19 '25 03:08 AznamirWoW

Example.wav

I have shared an example file I am using. There are about 300-400 other files with the exact same properties, except for length. The length will vary between 1 to 8 seconds.

Aug 19 '25 03:08 NitrogenSulfide

I have shared an example file I am using. There are about 300-400 other files with the exact same properties, except for length. The length will vary between 1 to 8 seconds.

If you want me to try and reproduce your issue, I need your whole set.

Generally I do not recommend using loose files as they come with different duration. Ideally you want a uniformly sliced dataset with some overlaps between segments.

Aug 19 '25 03:08 AznamirWoW

Ok interestingly enough, I did try a clean download of that huge 5gb Applio manual install, and it seems to be working as of right now.

If you want me to try and reproduce your issue, I need your whole set. Generally I do not recommend using loose files as they come with different duration. Ideally you want a uniformly sliced dataset with some overlaps between segments.

Ah ok, is that what the preprocess slicing is for? I noticed the default is 3 seconds, and I see that longer wav files are cut up so sentences are cut off etc. Is that ok?

I forget to mention that the original was running off of Dione.

Aug 19 '25 03:08 NitrogenSulfide

Ah ok, is that what the preprocess slicing is for? I noticed the default is 3 seconds, and I see that longer wav files are cut up so sentences are cut off etc. Is that ok?

preprocess slicing accepts large files (10+ minutes) and uses either a new simple method or old automatic slicing method. Simple method is very fast and it assumes you've prepared your dataset file and removed all excessive silences >0.25s. Automatic attempts to find gaps between words and use that as a slice points.

Simple slicing produces uniform 3s slices with 0.3s overlap. Automatic method may produce the same 3s slices and overlap if it finds no silences, or 1-5s segments if it does.

Using segments smaller and 3s or larger than 5s is not recommended.

Aug 19 '25 03:08 AznamirWoW

Ok gotcha. I just went with the default simple slice. Gonna play around with my dataset and settings. Thanks for the help and useful info!

Aug 19 '25 03:08 NitrogenSulfide

I did exactly the same as described in the problem on rented servers 3090/4090 on different versions of cuda, before everything worked fine, but now after installing this:

Aug 19 '25 10:08 idolistmedalist

I did exactly the same as described in the problem on rented servers 3090/4090 on different versions of cuda, before everything worked fine, but now after installing this:

Did you install it manually? Or did you use a compiled version?

Aug 19 '25 11:08 AznamirWoW

I did exactly the same as described in the problem on rented servers 3090/4090 on different versions of cuda, before everything worked fine, but now after installing this:

Did you install it manually? Or did you use a compiled version?

Manually, everything worked fine before, now I tried it on cuda 12.6 and this comes up.

Aug 19 '25 12:08 idolistmedalist

Manually, everything worked fine before, now I tried it on cuda 12.6 and this comes up.

It is not clear. Are you using linux? windows? wsl on windows?

to test why f0 extraction fails, try running a simple inference using rmvpe. Something is breaking the extraction, could be bad rmvpe.pt file.

Aug 19 '25 13:08 AznamirWoW

Раньше вручную все работало нормально, сейчас попробовал на cuda 12.6 и вот что вышло.

Непонятно. Вы используете Linux? Windows? WSL на Windows?

Чтобы проверить, почему извлечение f0 не удаётся, попробуйте выполнить простой вывод с помощью rmvpe. Что-то мешает извлечению, возможно, файл rmvpe.pt поврежден.

Linux desktop, the files may not be corrupted as I used to use them for training on version 3.2.9 before

Aug 19 '25 13:08 idolistmedalist

i see two OS here

Aug 19 '25 13:08 blaisewf

i see two OS here

I rent servers on 3090/4090 on linux my pc has a very long training time just

Aug 19 '25 14:08 idolistmedalist

I rent servers on 3090/4090 on linux my pc has a very long training time just

As I said, run inference - that may help you diagnose why f0 extraction is failing

as for linux and cuda, I have no idea.. I can only suggest checking that cudnn libraries are installed properly

https://chatgpt.com/share/68a47e5f-5d7c-800a-8aa1-ca38a1ba0a93

Aug 19 '25 14:08 AznamirWoW

Я арендую серверы на 3090/4090 на Linux, у моего ПК очень долгое время обучения.

Как я уже сказал, запустите вывод — это может помочь вам диагностировать причину сбоя извлечения f0.

Что касается Linux и CUDNN, то я понятия не имею. Могу только посоветовать проверить, правильно ли установлены библиотеки CUDNN.

https://chatgpt.com/share/68a47e5f-5d7c-800a-8aa1-ca38a1ba0a93

Ok, thanks

Aug 19 '25 14:08 idolistmedalist