VoiceCraft
VoiceCraft copied to clipboard
WIP: Float16 KV Cache in voicecraft.py
Didn't appear to do anything bad. Not sure how much it helps. Give it a try. I think there are some missing torch GC calls somewhere because not all memory is always cleared. Are there other places we can use FP16? In inference it shouldn't matter, unlike training.
Thanks!
Do you have an estimate on how much VRAM after do make the cache fp16?
With fp32, for the default example in the demo, For the 830M model, it needs around 22GB with kvcache on, 12GB with kvcache off (i.e. kvcache=0); for the 330M model, 15GB with kvcache on, 5GB with kvcache off
In addition, can one make the entire model/operation in fp16?
The model loading with whisperX is about 6gb but it goes up on inference.
I tried to add model.half() in the model loading code too but there was no difference. It could be due to the 4 batches, I think it uses less if you set it do do 1 batch.
https://files.catbox.moe/azwyj4.mov
here is what it does on my machine. I wonder why the CPU use is so high as well.