CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Loading model on low CPU memory

Open barschiiii opened this issue 10 months ago • 5 comments

I am struggling to load a quantized model lacking sufficient CPU memory to load the weights.

Usually I would split the weights up in multiple shards and then load them accordingly.

Is this, or something similar, also possible in CTranslate?

barschiiii avatar Aug 09 '23 13:08 barschiiii