ml-mdm icon indicating copy to clipboard operation
ml-mdm copied to clipboard

Reduce LM memory usage

Open levinkhho opened this issue 1 year ago • 0 comments

If CUDA is available: loads the language model in 8-bit quantized format using bitsandbytes Else: loads the LM in torch.float16

One could also look into using CTranslate2 for quantization, which would work on CPU.

https://github.com/apple/ml-mdm/issues/47

levinkhho avatar Dec 13 '24 19:12 levinkhho