ColossalAI
ColossalAI copied to clipboard
[BUG]: RuntimeError: Failed to replace block_sparse_moe of type MixtralSparseMoeBlock with EPMixtralSparseMoeBlock with the exception: CUDA out of memory
🐛 Describe the bug
When training the Mixture of Experts (MoE) model with code snippets in the application/ColossalMoE, I encountered Out of Memory (OOM) issues at the beginning.
RuntimeError: Failed to replace block_sparse_moe of type MixtralSparseMoeBlock with EPMixtralSparseMoeBlock with the exception: CUDA out of memory. Tried to allocate 112.00 MiB.
full trace:
Traceback (most recent call last):
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 197, in _replace_sub_module
replace_layer = target_module.from_native_module(
File "mixtral/mixtral_layer.py", line 40, in from_native_module
LazyInitContext.materialize(module)
File ".local/lib/python3.10/site-packages/colossalai/lazy/lazy_init.py", line 600, in materialize
return _apply_to_lazy_module(module, apply_fn, verbose)
File ".local/lib/python3.10/site-packages/colossalai/lazy/lazy_init.py", line 625, in _apply_to_lazy_module
apply_fn(name, p)
File ".local/lib/python3.10/site-packages/colossalai/lazy/lazy_init.py", line 598, in apply_fn
p.materialize()
File ".local/lib/python3.10/site-packages/colossalai/lazy/lazy_init.py", line 215, in materialize
target = self._materialize_data()
File ".local/lib/python3.10/site-packages/colossalai/lazy/lazy_init.py", line 240, in _materialize_data
init_val = func(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 43.56 MiB is free. Process 3466292 has 79.28 GiB memory in use. Of the allocated memory 78.10 GiB is allocated by PyTorch, and 127.50 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "debug.py", line 553, in <module>
main(args, cfg)
File "debug.py", line 174, in main
model, optimizer, _, _, lr_scheduler = booster.boost(model, optimizer, lr_scheduler=lr_scheduler)
File ".local/lib/python3.10/site-packages/colossalai/booster/booster.py", line 138, in boost
model, optimizer, criterion, dataloader, lr_scheduler = self.plugin.configure(
File ".local/lib/python3.10/site-packages/colossalai/booster/plugin/moe_hybrid_parallel_plugin.py", line 355, in configure
model = HybridParallelModule(
File ".local/lib/python3.10/site-packages/colossalai/booster/plugin/hybrid_parallel_plugin.py", line 70, in __init__
module, self.shared_params = shardformer.optimize(module, policy=custom_policy)
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/shardformer.py", line 54, in optimize
shared_params = sharder.shard()
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 43, in shard
self._replace_module(include=held_layers)
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 67, in _replace_module
self._recursive_replace_layer(
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 115, in _recursive_replace_layer
self._recursive_replace_layer(
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 115, in _recursive_replace_layer
self._recursive_replace_layer(
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 115, in _recursive_replace_layer
self._recursive_replace_layer(
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 112, in _recursive_replace_layer
self._replace_sub_module(module, sub_module_replacement, include)
File ".local/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 201, in _replace_sub_module
raise RuntimeError(
RuntimeError: Failed to replace block_sparse_moe of type MixtralSparseMoeBlock with EPMixtralSparseMoeBlock with the exception: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 43.56 MiB is free. Process 3466292 has 79.28 GiB memory in use. Of the allocated memory 78.10 GiB is allocated by PyTorch, and 127.50 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF. Please check your model configuration or sharding policy, you can set up an issue for us to help you as well.
Could some help ? Thanks a lot. cc @flybird11111 @ver217
Environment
- main branch (0.3.6)
- Pytorch 2.1.2
- Cuda 11.8
- gpus: 8x8 A800 80G
- Model: mistralai/Mixtral-8x7B-v0.1
- tp_size = 1
- ep_size = 8
- pp_size = 1