VoiceCraft icon indicating copy to clipboard operation
VoiceCraft copied to clipboard

Question: VRAM requirements for training, finetuning, and inference?

Open ProjectProgramAMark opened this issue 1 year ago • 3 comments

Do we have a general sense on this? Has LoRA/QLoRA fine tuning been attempted on this, and if so, any guidance?

ProjectProgramAMark avatar Apr 06 '24 16:04 ProjectProgramAMark

Thanks

Inference: For the default example in the demo (the one in inference_tts.ipynb), For the 830M model, it needs around 22GB with kvcache on (i.e. kvcache=1), 12GB with kvcache off; for the 330M model, 15GB with kvcache on, 5GB with kvcache off

Training: 48GB

LoRA is not used so far

jasonppy avatar Apr 06 '24 16:04 jasonppy

Awesome, thank you for the quick response! I'm hoping to see some LoRA/QLoRA action on this soon. I think something like being able to switch out adapter weights on a base model and having different voices come out of it is something that would be so cool to see. I will try and push that through myself if I have the time (if you have any recommendations on which layers to apply it to that would be great!), but regardless I think this is awesome and I'm excited to start playing around with it

ProjectProgramAMark avatar Apr 06 '24 16:04 ProjectProgramAMark

hello

Potatooff avatar Nov 18 '24 17:11 Potatooff