WavTokenizer issues

When will the large unify model (speech, music, audio) be released?

looking forward to your unfiy model

probability density for each index in the codebook

In Section 3.2 of the paper, you presented the probability density for each index in the codebook. Could you explain how this was achieved? Also, during this process, were you...

goforher

Release of the bigger models :)

2

Hey, I am Christoph one of the co-founders of LAION. We are working on open source Models like gpt4o and a looking for a better Audio Codec than Snac, which...

christophschuhmann

Performance in LLM-based-TTS

2

Does any train this model and using it to train LLM-based TTS. How about the performace? I mean performance of wav quanlity, as well as performace in zero-shot-TTS.

Liujingxiu23

important

Files Missing？

6

In line 15 of `ac.py`, the code `from ..binary import BitPacker, BitUnpacker` references `binary`, but there is no such folder or dependency package in the project.

goforher

CER Performance of Reconstructed Audio

6

When using the 40 tokens/s configuration, although the quality of the reconstructed audio is very good, there are often some mispronunciations. Have you measured the CER performance of the reconstructed...

howitry

why grad norm is so high？

1

s ![微信截图_20240923214125](https://github.com/user-attachments/assets/d0ba934e-e018-49cf-ac2d-92b146506b29)

necrophagists

Why so large commit loss weight

4

Hi, author. I find that in the training code, the commit loss weight is set to 1000 which is much higher than that of encodec and speechtokenizer, why so large...

Ming-er

how to train the model with Token/s about 23, that is hopsize=1024

4

I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but...

Liujingxiu23

Question about training

2

I'm using audio data from my own realm to do a continuous training on the checkpoint of WavTokenizer-mdium. However, it was found that the model seemed to get worse and...

handsomelys

WavTokenizer
WavTokenizer copied to clipboard

Metadata

When will the large unify model (speech, music, audio) be released?

probability density for each index in the codebook

Release of the bigger models :)

Performance in LLM-based-TTS

Files Missing？

CER Performance of Reconstructed Audio

why grad norm is so high？

Why so large commit loss weight

how to train the model with Token/s about 23, that is hopsize=1024

Question about training

← Metadata

Owner

Metadata

WavTokenizer WavTokenizer copied to clipboard

Metadata

← Metadata

Owner

Metadata

WavTokenizer
WavTokenizer copied to clipboard