supervoice-voicebox icon indicating copy to clipboard operation
supervoice-voicebox copied to clipboard

Facodec and training

Open yiwei0730 opened this issue 1 year ago • 7 comments

I saw that there is another library called super-gpt-facodec. Is there any chance I can connect them in series? And I want to ask about training issues Should super-gpt-facodec and supervoice be trained separately? Is there a step-by-step guide I can borrow?

yiwei0730 avatar Mar 28 '24 02:03 yiwei0730

I tried to do FACodec training, but haven't succeeded. In general i am not satisfied with FACodec performance.

ex3ndr avatar Mar 28 '24 02:03 ex3ndr

thanks for your answer What if I need to train in the use of Chinese and English (about 400,000 pieces of information in total) @ex3ndr Do I need to train the two libraries supervoice-gpt and supervoice separately? What are the steps for training?

yiwei0730 avatar Mar 28 '24 03:03 yiwei0730

I also discourage by facodec performance.

rishikksh20 avatar Mar 31 '24 10:03 rishikksh20

I would like to ask if you can describe in detail how you feel about the effect after testing it. I think he has emphasized that the key to NS3 is the FACODEC, but the amount of parameters added and used in NS3 is indeed large enough.

yiwei0730 avatar Apr 01 '24 07:04 yiwei0730

If this is question for me, my main problem is that in is frame-based and doing prediction of this tokens would be challenging because how much of them you needed. Also my tests didn't show what promised in papers - codes are not disentangled - you need residual codes for nice voice, content codes are dependent on speaker identity still, etc

Steve Korshakov

Sent via @.***>

On Mon, Apr 01, 2024 at 12:43 AM, yiwei0730 @.@.>> wrote:

I would like to ask if you can describe in detail how you feel about the effect after testing it. I think he has emphasized that the key to NS3 is the FACODEC, but the amount of parameters added and used in NS3 is indeed large enough.

— Reply to this email directly, view it on GitHubhttps://github.com/ex3ndr/supervoice/issues/7#issuecomment-2029325850, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AADB2E3MP2EG4HUDMH3WKQ3Y3EFYZAVCNFSM6AAAAABFL7XNFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRZGMZDKOBVGA. You are receiving this because you were mentioned.Message ID: @.***>

ex3ndr avatar Apr 01 '24 17:04 ex3ndr

@yiwei0730 I have the same conclusion as @ex3ndr , Gradient Reversal is not a 100 % working mechanism and FACodec mostly relied on that which results in information leakage between the codes That is why codes are not properly disentangled. It might work with NS3 architecture because all codes ultimately combine before feeding to the decoder, but if you plan to use the codec separately it won't work properly and result in poor quality.

rishikksh20 avatar Apr 02 '24 05:04 rishikksh20

Well I don’t think the results are good enough to slam diffusion on top of it too: the problem of voice box is that it is too versatile and you need to control it, but this tokens probably are too tied to specific speech styles to be useful.

Steve Korshakov

Sent via Superhuman @.***>

On Mon, Apr 1 2024 at 10:20 PM, Rishikesh @.@.>> wrote:

@yiwei0730https://github.com/yiwei0730 I have the same conclusion as @ex3ndrhttps://github.com/ex3ndr , Gradient Reversal is not a 100 % working mechanism and FACodec mostly relied on that which results in information leakage between the codes That is why codes are not properly disentangled. It might work with NS3 architecture because all codes ultimately combine before feeding to the decoder, but if you plan to use the codec separately it won't work properly and result in poor quality.

— Reply to this email directly, view it on GitHubhttps://github.com/ex3ndr/supervoice/issues/7#issuecomment-2031101867, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AADB2EZCABWKJPW4GXPX5PDY3I5YPAVCNFSM6AAAAABFL7XNFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZRGEYDCOBWG4. You are receiving this because you were mentioned.Message ID: @.***>

ex3ndr avatar Apr 02 '24 07:04 ex3ndr