Andrei Constantinescu
Andrei Constantinescu
Is this used for transcribing? You can just use whisper to transcribe most languages. Then convert it into hdf5 file and texts and feed it via dynamic batch samppler
Yes i'd be interested in collaboration too. I already setup a ViVit ( video vision transformer) architecture with this DiT as a reference. If you look at Sora they also...
All i see is that you've removed the phonemizer dependency you use english to ipa library. But if i'd like to add languages can I still use phonemizer?
Is there a specific model loss/validation loss you employed as a benchmark for convergence?
Hello. What do you mean by two different pairs of files? I have the .qnt.pt and phon.txt and normalized.txt and wav files under my directory data/librosa the config files are...
> Your phenome files need to have between 10 and 50 phonemes in them. Try using shorter audio clips, even 10 second clips can be too long. My training samples...