composer icon indicating copy to clipboard operation
composer copied to clipboard

Initialize Models on GPU

Open jbloxham opened this issue 2 years ago • 1 comments

This might not be easily possible.

In PyTorch, nn.module instances always seem to get initialized on the CPU, not the GPU, even though it is possible to initialize tensors directly on the GPU. GPU initialization is much faster, and we're already starting to see some measurably painful slow startups for LLMs. We should investigate whether there's a way we could initialize models faster. The ideal would probably be to provide some sort of context manager that hacks PyTorch to initialize things on GPU.

jbloxham avatar Mar 08 '22 18:03 jbloxham

It seems like PyTorch devs keep trying to avoid adding a device context manager... not entirely sure I agree with them, but that's that :(

On the bright side, most (maybe all) Pytorch modules can be passed device=... in their init, and then they will create their tensors on the target device directly. So if we built our own model classes with argument device=... that propagates to all its submodules, that model could be initialized directly on the GPU.

abhi-mosaic avatar Mar 15 '22 18:03 abhi-mosaic

If a model is specified on the meta device, Trainer will correctly initialize on gpu if specified

mvpatel2000 avatar Nov 03 '22 15:11 mvpatel2000