airllm
airllm copied to clipboard
Data Parallel across multiple GPUs?
Is it possible to initialize a model in data parallel and stream one layer to multiple GPUs for high batch sizes?