ukemamaster
ukemamaster
### Describe the bug I have fine tuned XTTS v2 model on my own data containing both long and short audios (with the following histogram showing duration in seconds on...
@joonson To train a binary classifier (having 2 speakers in the entire data), what should be the values for `max_seg_per_spk , nClasses, nPerSpeaker, and batch_size` ? i have been trying...
Hi, Nice work and congratulations on your paper. Do you plan to open the code in the near future?
## What In the re-cutting stage i would like to have an option to be able to specify minimum audio length. because the re-cutted audios are very very small and...
Is it possible to use the `musicgen-melody model` in the [Transformers library](https://github.com/huggingface/transformers) like the [`musicgen-small model`](https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md#-transformers-usage) ? I gave it a try : ```python from transformers import AutoProcessor, MusicgenForConditionalGeneration processor...
@hmartiro In interpolation, i always get repetitve music, every 5 seconds. Even with your seed images from huggungface repo. Any tips to avoid this?
Hi @hmartiro Could you please explain how did you generate the seed images? Are they simply spectrograms of music audios? or some pre- or post-filtering was applied? When i use...
Hi @hmartiro. Could you please confirm if this app works in the interpolaion mood? or in simple (text to audio) mood? The simple (text to audio) mood generates 5.12 (for...
Hi @OlaWod, i appreciate your work. I am trying to fine tune the FreeVC model with my custom multilingual data (using an already trained speaker encoder model), and without SR...
Hi @joonson, Could you please give some hints to make it work for a multi-node multi gpu distributed training?