CTranslate2
CTranslate2 copied to clipboard
Loading model on low CPU memory
I am struggling to load a quantized model lacking sufficient CPU memory to load the weights.
Usually I would split the weights up in multiple shards and then load them accordingly.
Is this, or something similar, also possible in CTranslate?