Applio icon indicating copy to clipboard operation
Applio copied to clipboard

[Feature]: Possibility of uploading one or various audios for validation before start training and displaying them on the tensorboard (during training)

Open Mixomo opened this issue 1 year ago • 8 comments

Description

Possibility to upload one or several audios for validation before start the training from the applio GUI and have them displayed in the tensorboard during the progress of the training, specifically in the audio data tab. In this way, the user will be able to monitor how his model sounds through the progress of the steps and decide whether or not to finish the training.

Problem

Not being able to easily and reliably check how the model sounds, because if one wants to make an inference during training, it often does not let you do it, or slows down the whole training, depending on the user's GPU.

image

Proposed Solution

I don't know if RVC in its code reserves some audios of the dataset for validation. If that is the case, to be able to show those audios of validation in tensorboard. (although also the possibility of uploading custom audios would not be bad).

Alternatives Considered

n/a

Mixomo avatar Oct 03 '24 15:10 Mixomo

Once the real training starts, you can click Create Index and get it done. While the training running you can refresh the inference screen and run the inference with the newly generated models, yes, it is a little slower than without training running in parallel, but it does not slow down things too much. If you think that automatically inferring a test audio is somehow faster, it is not.

AznamirWoW avatar Oct 03 '24 16:10 AznamirWoW

we are looking at something similar to that, ill update

blaisewf avatar Oct 03 '24 17:10 blaisewf

@Mixomo you can grab the updated rvc/train/train.py from the repository and give it a try

AznamirWoW avatar Oct 03 '24 21:10 AznamirWoW

@AznamirWoW It works like a charm! Thank you!

@blaisewf I don't know if you want to discuss anything else regarding this, or if not, you can close the issue. Thank you very much.

Mixomo avatar Oct 04 '24 21:10 Mixomo

we will see ways to customize more the experience, but i think it's a good starting point

blaisewf avatar Oct 04 '24 21:10 blaisewf

@AznamirWoW for the tensorboard I would recommend having all audio in one place and having a slider to go to different steps, that's the way SVC used to do It If I remember correctly. Would make things less cluttered, especially If you save a lot. Also, would it be possible to have the same audio inferenced? that way you can more easily compare between versions, and you could even have the "ground truth" below, which is just the raw file from the dataset, makes for easy comparison. Either way, great work! Makes training a lot easier when you can hear the progression as you are training! This is a feature I have been asking for a LONG time!

kro-ai avatar Oct 07 '24 12:10 kro-ai

It does not looks like slider is possible for audios.

if I use f"gen/audio_{global_step:07d}" it creates a new entry, if I use f"gen/audio" it overwrites the previous.

as for using a constant sample, yes thats possible.

AznamirWoW avatar Oct 07 '24 15:10 AznamirWoW

It does not looks like slider is possible for audios.

if I use f"gen/audio_{global_step:07d}" it creates a new entry, if I use f"gen/audio" it overwrites the previous.

as for using a constant sample, yes thats possible.

I am misremembering, I apologize. I just looked at a video of someone using So-VITS, and it is indeed just as you have implemented it now.

kro-ai avatar Oct 07 '24 15:10 kro-ai