CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Distributed mode

Open EnricoBeltramo opened this issue 1 year ago • 6 comments

It's present or is planned a way to run inference in distribuited mode on multiple different machines?

EnricoBeltramo avatar Aug 12 '23 10:08 EnricoBeltramo

Can you specify what you mean exactly? Do you mean splitting the model on multiple machines (model/tensor parallelism), or loading the same model on multiple machines (data parallelism)?

guillaumekln avatar Aug 17 '23 13:08 guillaumekln

I mean model parallelism, i.e when the model is too big to fit in a single machine vram

EnricoBeltramo avatar Aug 20 '23 09:08 EnricoBeltramo

Yes I also want to know whether this is possible. Can we have something similar to what device_map="auto" does?

MrigankRaman avatar Aug 27 '23 18:08 MrigankRaman

As far as I know, device_map="auto" will not load a model on multiple machines. To load the model on multiple GPUs (on the same machine), see the existing issue #1052.

guillaumekln avatar Aug 28 '23 08:08 guillaumekln

I closed this issue as completed. The tensor parallelism is currently supported in Ctranslate2.

minhthuc2502 avatar Apr 23 '24 07:04 minhthuc2502