[Feature]: Possibility of uploading one or various audios for validation before start training and displaying them on the tensorboard (during training)
Description
Possibility to upload one or several audios for validation before start the training from the applio GUI and have them displayed in the tensorboard during the progress of the training, specifically in the audio data tab. In this way, the user will be able to monitor how his model sounds through the progress of the steps and decide whether or not to finish the training.
Problem
Not being able to easily and reliably check how the model sounds, because if one wants to make an inference during training, it often does not let you do it, or slows down the whole training, depending on the user's GPU.
Proposed Solution
I don't know if RVC in its code reserves some audios of the dataset for validation. If that is the case, to be able to show those audios of validation in tensorboard. (although also the possibility of uploading custom audios would not be bad).
Alternatives Considered
n/a
Once the real training starts, you can click Create Index and get it done. While the training running you can refresh the inference screen and run the inference with the newly generated models, yes, it is a little slower than without training running in parallel, but it does not slow down things too much. If you think that automatically inferring a test audio is somehow faster, it is not.
we are looking at something similar to that, ill update
@Mixomo you can grab the updated rvc/train/train.py from the repository and give it a try
@AznamirWoW It works like a charm! Thank you!
@blaisewf I don't know if you want to discuss anything else regarding this, or if not, you can close the issue. Thank you very much.
we will see ways to customize more the experience, but i think it's a good starting point
@AznamirWoW for the tensorboard I would recommend having all audio in one place and having a slider to go to different steps, that's the way SVC used to do It If I remember correctly. Would make things less cluttered, especially If you save a lot. Also, would it be possible to have the same audio inferenced? that way you can more easily compare between versions, and you could even have the "ground truth" below, which is just the raw file from the dataset, makes for easy comparison. Either way, great work! Makes training a lot easier when you can hear the progression as you are training! This is a feature I have been asking for a LONG time!
It does not looks like slider is possible for audios.
if I use f"gen/audio_{global_step:07d}" it creates a new entry, if I use f"gen/audio" it overwrites the previous.
as for using a constant sample, yes thats possible.
It does not looks like slider is possible for audios.
if I use f"gen/audio_{global_step:07d}" it creates a new entry, if I use f"gen/audio" it overwrites the previous.
as for using a constant sample, yes thats possible.
I am misremembering, I apologize. I just looked at a video of someone using So-VITS, and it is indeed just as you have implemented it now.