tortoise-tts
tortoise-tts copied to clipboard
Int 4 quantization?
Hello,
I'm relatively new here, but I was wondering if 4-bit quantization is possible to speed up inference.
I know that tortoise-tts-fast allows half precision, but I was wondering if it is possible to further quantize the models.
Thank you
I'm also interested in whether this works!
I wonder if @ggerganov could make tortoise.cpp?
Related to https://github.com/ggerganov/ggml/issues/59
https://github.com/balisujohn/tortoise.cpp
tortoise.cpp implementation underway :) contributions are welcome!