VoiceCraft WIP: Float16 KV Cache in voicecraft.py

WIP: Float16 KV Cache in voicecraft.py

Open Ph0rk0z opened this issue 1 year ago • 3 comments

Didn't appear to do anything bad. Not sure how much it helps. Give it a try. I think there are some missing torch GC calls somewhere because not all memory is always cleared. Are there other places we can use FP16? In inference it shouldn't matter, unlike training.

Apr 05 '24 17:04 Ph0rk0z

Thanks!

Do you have an estimate on how much VRAM after do make the cache fp16?

With fp32, for the default example in the demo, For the 830M model, it needs around 22GB with kvcache on, 12GB with kvcache off (i.e. kvcache=0); for the 330M model, 15GB with kvcache on, 5GB with kvcache off

In addition, can one make the entire model/operation in fp16?

Apr 05 '24 21:04 jasonppy

The model loading with whisperX is about 6gb but it goes up on inference.

I tried to add model.half() in the model loading code too but there was no difference. It could be due to the 4 batches, I think it uses less if you set it do do 1 batch.

Apr 06 '24 00:04 Ph0rk0z

https://files.catbox.moe/azwyj4.mov

here is what it does on my machine. I wonder why the CPU use is so high as well.

Apr 06 '24 00:04 Ph0rk0z

VoiceCraft VoiceCraft copied to clipboard

WIP: Float16 KV Cache in voicecraft.py

VoiceCraft
VoiceCraft copied to clipboard