josephwong14wkh comments

Results 8 comments of


                                            josephwong14wkh

Max spec len dimension error

Hi, I have encountered this error also. Have you fixed this error?

Detect language and transcribe in separate steps

I have tested the speed of language detection. It seems that faster-whisper is slower than original OpenAI whisper on language ID. Here is my setting: **model size:** medium **compute type:**...

[Help]: Questions about the tokenizers of vevo1.5

Yes, we need to train the content tokenizer ourselves in order to use audio as input to AR

[Help]: Questions about the tokenizers of vevo1.5

@RMSnow Thank you for your detailed explanation. Also, may i know why you set `model.coco.codebook_dim = 8` in `contentstyle_fvq16384_12.5hz.json`, which is so small. As i know it is the dimension...

[Help]: Questions about the tokenizers of vevo1.5

Thanks for the recommendations. I'll definitely check them out to dive deeper! I have another question about the training loss. I am training the tokenizer from scratch with my own...

[Help]: Questions about the tokenizers of vevo1.5

Got it! Thank you very much!

[Help]: Vevo 1.5 training without emilia dataset

In my case, just set "**emilia**" in "**dataset**" to 0 is enough. "**use_emilia_dataset** " is set to true. When "emilia" in "dataset" is set to 0, it will only load...

[Help]: Vevo 1.5 training without emilia dataset

For my case, I did some more preprocessing step on the data in emilia dataset. So i got the numpy array from huggingface, ran the preprocessing steps, and save the...