VoiceCraft icon indicating copy to clipboard operation
VoiceCraft copied to clipboard

Add option for fp16 kv cache

Open deufs opened this issue 1 year ago • 3 comments

fp16 kv cache would greatly reduce memory usage without significant loss of quality.

deufs avatar Apr 02 '24 01:04 deufs

Changed the fp32s near KV_cache to FP16 and didn't experience any loss in quality. Unfortunately the memory stayed about the same. A lot of other calculations are also done in FP32 for some reason. Haven't tried to replace all FP32 to FP16 yet.

Ph0rk0z avatar Apr 03 '24 14:04 Ph0rk0z

Where else did you change it? I added both model.half and changed the float32s to float16s. The memory can still spike.

Ph0rk0z avatar Apr 04 '24 12:04 Ph0rk0z