FAcodec
FAcodec copied to clipboard
may i ask How did you eliminate the difficulty of requiring phoneme audio alignment through predicting semantic latent?
Can you indicate in which file you implemented this feature?
and , As you wrote in Read Me: \ t<speakeer_id>\ t