fish-speech
fish-speech copied to clipboard
SOTA Open Source TTS
3080 TI and it's still slow even with --compile it takes like 120 seconds at least even with small text input
Hi, Is there any experiments about LLM training speech input? there are two kind of inputs: the indices of codebook in codec, as a singel integer value, or the indexed...
The conflict is caused by: transformers 4.35.2 depends on tokenizers=0.14 faster-whisper 0.8.0 depends on tokenizers==0.13.* transformers 4.35.2 depends on tokenizers=0.14 faster-whisper 0.7.1 depends on tokenizers==0.13.* transformers 4.35.2 depends on tokenizers=0.14...
你好,我想训练一个法语的tts,不知道是否需要修改代码?如何修改可以支持。另外想咨询下大概需要多少小时的干声可以训练出来一个比较好的tts?这个tts是专有领域的(科技),不需要那么强的泛化。
data:image/s3,"s3://crabby-images/ca20d/ca20d012a72968ae2d627b139a037f493825c92d" alt="image"
This pull request addresses an issue in tools/vqgan/inference.py where the import statement for AUDIO_EXTENSIONS was incorrect. The import statement was originally: ```python from fish_speech.utils.file import AUDIO_EXTENSIONS ``` It has been...
**Is this PR adding new feature or fix a BUG?** Add feature / Fix BUG. **Is this pull request related to any issue? If yes, please link the issue.** #xxx
训练t2s的速度很慢,大约0.09it/s,我使用的GPU为8卡RTX A6000,batch size 为16,请问这个训练速度正常吗? 我用lightning profiler统计了一下,在backward和step的时候耗时最长 这个是用advanced分析的backward和step的结果 ``` Profile stats for: [Strategy]DDPStrategy.backward rank: 0 190 function calls (185 primitive calls) in 43.795 seconds Ordered by: cumulative time ncalls tottime percall...