WavTokenizer
WavTokenizer copied to clipboard
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
looking forward to your unfiy model
In Section 3.2 of the paper, you presented the probability density for each index in the codebook. Could you explain how this was achieved? Also, during this process, were you...
Hey, I am Christoph one of the co-founders of LAION. We are working on open source Models like gpt4o and a looking for a better Audio Codec than Snac, which...
Does any train this model and using it to train LLM-based TTS. How about the performace? I mean performance of wav quanlity, as well as performace in zero-shot-TTS.
In line 15 of `ac.py`, the code `from ..binary import BitPacker, BitUnpacker` references `binary`, but there is no such folder or dependency package in the project.
When using the 40 tokens/s configuration, although the quality of the reconstructed audio is very good, there are often some mispronunciations. Have you measured the CER performance of the reconstructed...
s 
Hi, author. I find that in the training code, the commit loss weight is set to 1000 which is much higher than that of encodec and speechtokenizer, why so large...
I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but...
I'm using audio data from my own realm to do a continuous training on the checkpoint of WavTokenizer-mdium. However, it was found that the model seemed to get worse and...