Puyuan Peng comments

Results 97 comments of


                                            Puyuan Peng

some Voice editing problem

Yea, the average length of gigaspeech is 5 sec, (even though there are many utterances in gigaspeech is >15 sec) so there is dataset bias in there. Finetune the model...

Validation loss Divergence?

Thanks Note sure about the validation loss. The model is overfitting on your data (if there isn't significantly train/val domain mismatch): it's getting 95+% training top10acc on codebook 1, but...

I still have the results for a slightly different model, but should mostly be the same: 'train_top10acc_cb1': '0.5548 (0.5261)', 'train_top10acc_cb2': '0.4790 (0.4456)', 'train_top10acc_cb3': '0.4369 (0.3947)', 'train_top10acc_cb4': '0.3694 (0.3226)' 'val_top10acc_cb1': '0.5001731514930725',...

FluentSpeech model trained on GigaSpeech

sure, shot me an email

train.txt and validation.txt generation from extracted_codes_and_phonemes

Thanks! I'm currently resolving paper reviews so wouldn't have the capacity to update this repo. But yes the three columns are "0 name codec_number", codec_number means how long is the...

Error in loading your tuned EnCodec from Huggingface

Thanks! username issue is reflected in newer commit [741a655](https://github.com/jasonppy/VoiceCraft/commit/741a6559e98c4299324ad6e9fd454fb48d8f3cae)

espeak not working as backend on Windows OS

try quickstart with docker, should work for windows https://github.com/jasonppy/VoiceCraft?tab=readme-ov-file#quickstart

WIP: Float16 KV Cache in voicecraft.py

Thanks! Do you have an estimate on how much VRAM after do make the cache fp16? With fp32, for the default example in the demo, For the 830M model, it...

seed - magic number

Thanks, you are right. Fixed in https://github.com/jasonppy/VoiceCraft/commit/991b1fe3bb622698b15223df5d91eea33d79d2b9