voicebox-pytorch Training Example

The training example given seems to be missing the mask vector? In the paper the input to the model was the audio, mask and the phoneme sequence (which was aligned to the audio in the previous implementation of this repo).

So where are the mask vectors and the phoneme sequence used in the training?

Thank You and great appreciation for all you have done.

Sep 25 '23 23:09 YKoustubhRao

can you screenshot or paste the relevant section of the paper for said mask?

Sep 25 '23 23:09 lucidrains

I'm introducing spear tts conditioning, proven out in the soundstorm repository, and bypassing duration, phoneme, alignment stuff.

Sep 25 '23 23:09 lucidrains

.

Alright, I will read up about Spear TTS. Could you tell me what the 'cond' variable actual mean with respect to an audio and transcript?

And we might have to use a different TTS for other languages for the alignment.

Thank You

Sep 26 '23 00:09 YKoustubhRao

@YKoustubhRao thanks for the screenshot

i've decided to automatically manage the condition if you were to pass in the binary temporal mask as they said in 3.2, as cond_mask. it will also be auto generated during training. during inference, you would construct the condition as to zero out the section you would like to infill

Sep 26 '23 00:09 lucidrains

@YKoustubhRao i will get the phoneme / duration / aligner stuff finished by end of week along with some training code

Sep 26 '23 00:09 lucidrains

Is there a pipeline for denoising and zero shot tts? @lucidrains

Oct 05 '23 09:10 YKoustubhRao

Hello lucidrains, can you share your training script and data preparation code to make it easier to try? Thanks in advance.

Oct 07 '23 09:10 blldd

Any updates on this?

Nov 11 '23 12:11 kdcyberdude

Any updates on this?

Same question.

Dec 13 '23 14:12 nrailg

ah, the code is all in there and @lucasnewman has already trained models successfully. i'll update the readme by end of week

Dec 13 '23 14:12 lucidrains

ah, the code is all in there and @lucasnewman has already trained models successfully. i'll update the readme by end of week

Hello. Will the weights be released?

Thank you

Dec 22 '23 07:12 Subarasheese

ah, the code is all in there and @lucasnewman has already trained models successfully. i'll update the readme by end of week

Hello. Will the weights be released?

Thank you

Hey all, there's a small pretrained model available in this discussion thread: https://github.com/lucidrains/voicebox-pytorch/discussions/29#discussioncomment-7732769

All the training code is in the repo and I put the details for the training hyperparams in the thread, so training your own model should be as straightforward as instantiating the models, dataset, and trainer and calling train() -- if you're having issues, report back and I can try to help.

Dec 22 '23 14:12 lucasnewman

@lucasnewman Thanks for your hyperparams and pretrained model. It can achieve acceptable results with a batch size of 32 and 100k step on a 4090 GPU.

Dec 29 '23 04:12 clcarwin

@lucasnewman Thanks for your hyperparams and pretrained model. It can achieve acceptable results with a batch size of 32 and 100k step on a 4090 GPU.

Hey, can you send us sound samples?

Jan 16 '24 11:01 shigabeev

@shigabeev, @lucasnewman has some voice samples in the repo, You should be able to reproduce the same results. If you still need samples let me know, I might be able to send you some

Jan 18 '24 11:01 wassimseif

@shigabeev, @lucasnewman has some voice samples in the repo, You should be able to reproduce the same results. If you still need samples let me know, I might be able to send you some

Yeah, I found his trained model on HF, it sounds pretty good. However I wasn't able to figure out how to run in text conditioned mode (TTS). Can you show me the way to do it? Or can you just send some of your audio samples with TTS?

Jan 18 '24 11:01 shigabeev

@lucidrains I see that the ConditionalFlowMatcherWrapper class currently lacks support for a Duration Predictor. If you've already worked on this, would it be possible to add it? I'd really appreciate it! Thanks!!!!

Sep 13 '24 11:09 iishapandey

@iishapandey hey Isha, thanks for your interest

i would recommend that you take a look at this follow up research, where they best Voicebox with a simpler scheme

there i include a duration predictor

Sep 13 '24 14:09 lucidrains

voicebox-pytorch voicebox-pytorch copied to clipboard

Training Example

voicebox-pytorch
voicebox-pytorch copied to clipboard