Puyuan Peng
Puyuan Peng
Yea, the average length of gigaspeech is 5 sec, (even though there are many utterances in gigaspeech is >15 sec) so there is dataset bias in there. Finetune the model...
Thanks! Looking forward to it!
Thanks Note sure about the validation loss. The model is overfitting on your data (if there isn't significantly train/val domain mismatch): it's getting 95+% training top10acc on codebook 1, but...
I still have the results for a slightly different model, but should mostly be the same: 'train_top10acc_cb1': '0.5548 (0.5261)', 'train_top10acc_cb2': '0.4790 (0.4456)', 'train_top10acc_cb3': '0.4369 (0.3947)', 'train_top10acc_cb4': '0.3694 (0.3226)' 'val_top10acc_cb1': '0.5001731514930725',...
sure, shot me an email
Thanks! I'm currently resolving paper reviews so wouldn't have the capacity to update this repo. But yes the three columns are "0 name codec_number", codec_number means how long is the...
Thanks! username issue is reflected in newer commit [741a655](https://github.com/jasonppy/VoiceCraft/commit/741a6559e98c4299324ad6e9fd454fb48d8f3cae)
try quickstart with docker, should work for windows https://github.com/jasonppy/VoiceCraft?tab=readme-ov-file#quickstart
Thanks! Do you have an estimate on how much VRAM after do make the cache fp16? With fp32, for the default example in the demo, For the 830M model, it...
Thanks, you are right. Fixed in https://github.com/jasonppy/VoiceCraft/commit/991b1fe3bb622698b15223df5d91eea33d79d2b9