CTranslate2 Loading model on low CPU memory

Loading model on low CPU memory

Open barschiiii opened this issue 10 months ago • 5 comments

I am struggling to load a quantized model lacking sufficient CPU memory to load the weights.

Usually I would split the weights up in multiple shards and then load them accordingly.

Is this, or something similar, also possible in CTranslate?

Aug 09 '23 13:08 barschiiii