ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: RuntimeError: CUDA error: an illegal memory access was encountered

Open paulpaulzhang opened this issue 3 years ago • 4 comments

🐛 Describe the bug

I run the bert from huggingface with zero, but get RuntimeError: CUDA error: an illegal memory access was encountered, I found that this problem seemed to be caused by initial_scale in config.py

Traceback (most recent call last): File "colossalai/run.py", line 463, in train(args) File "colossalai/run.py", line 252, in train trainer(model, File "colossalai/run.py", line 127, in trainer engine.backward(loss) File "/home/paulzhang/miniconda3/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 163, in backward ret = self.optimizer.backward(loss) File "/home/paulzhang/miniconda3/lib/python3.8/site-packages/colossalai/zero/sharded_optim/sharded_optim_v2.py", line 169, in backward self.model.backward(loss) File "/home/paulzhang/miniconda3/lib/python3.8/site-packages/colossalai/zero/sharded_model/sharded_model_v2.py", line 233, in backward loss.backward() File "/home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: an illegal memory access was encountered Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9d1dfa2d62 in /home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: + 0x1c5f3 (0x7f9d6164f5f3 in /home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x7f9d61650002 in /home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f9d1df8c314 in /home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4: + 0x29adb9 (0x7f9de496cdb9 in /home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5: + 0xae0c91 (0x7f9de51b2c91 in /home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f9de51b2f92 in /home/paulzhang/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #7: + 0x15893b (0x56473bab593b in /home/paulzhang/miniconda3/bin/python) frame #8: + 0x193141 (0x56473baf0141 in /home/paulzhang/miniconda3/bin/python) frame #9: + 0x15893b (0x56473bab593b in /home/paulzhang/miniconda3/bin/python) frame #10: + 0x193141 (0x56473baf0141 in /home/paulzhang/miniconda3/bin/python) frame #11: + 0x158415 (0x56473bab5415 in /home/paulzhang/miniconda3/bin/python) frame #12: + 0x15893b (0x56473bab593b in /home/paulzhang/miniconda3/bin/python) frame #13: + 0x193141 (0x56473baf0141 in /home/paulzhang/miniconda3/bin/python) frame #14: + 0x1592ac (0x56473bab62ac in /home/paulzhang/miniconda3/bin/python) frame #15: + 0x158e77 (0x56473bab5e77 in /home/paulzhang/miniconda3/bin/python) frame #16: + 0x158e60 (0x56473bab5e60 in /home/paulzhang/miniconda3/bin/python) frame #17: + 0x158e60 (0x56473bab5e60 in /home/paulzhang/miniconda3/bin/python) frame #18: + 0x176057 (0x56473bad3057 in /home/paulzhang/miniconda3/bin/python) frame #19: PyDict_SetItemString + 0x61 (0x56473baf43c1 in /home/paulzhang/miniconda3/bin/python) frame #20: PyImport_Cleanup + 0x9d (0x56473bb32aad in /home/paulzhang/miniconda3/bin/python) frame #21: Py_FinalizeEx + 0x79 (0x56473bb64a49 in /home/paulzhang/miniconda3/bin/python) frame #22: Py_RunMain + 0x183 (0x56473bb66893 in /home/paulzhang/miniconda3/bin/python) frame #23: Py_BytesMain + 0x39 (0x56473bb66ca9 in /home/paulzhang/miniconda3/bin/python) frame #24: __libc_start_main + 0xf3 (0x7f9e409e50b3 in /lib/x86_64-linux-gnu/libc.so.6) frame #25: + 0x1e21c7 (0x56473bb3f1c7 in /home/paulzhang/miniconda3/bin/python)

Environment

No response

paulpaulzhang avatar Jul 18 '22 13:07 paulpaulzhang

Could you share me your code?

ver217 avatar Jul 20 '22 08:07 ver217

This usually occurs because of CUDA out-of-memory.

FrankLeeeee avatar Jul 20 '22 08:07 FrankLeeeee

@ver217 this is my code

def trainer(train_dataloader, args, val_dataloader=None):
    start_epoch = 0

    shard_strategy = TensorShardStrategy()
    with ZeroInitContext(target_device=torch.cuda.current_device(), shard_strategy=shard_strategy,
                         shard_param=True):
            config = BertConfig.from_pretrained(args.model_name_or_path, num_labels=200)
            model = BertForSequenceClassification.from_pretrained(args.model_name_or_path, config=config)

    optimizer = HybridAdam(model.parameters(), weight_decay=1e-4)
    criterion = nn.CrossEntropyLoss()

    # 开始colossal初始化
    engine, train_dataloader, val_dataloader, _ = colossalai.initialize(model,
                                                                        optimizer,
                                                                        criterion,
                                                                        train_dataloader,
                                                                        val_dataloader,
                                                                        )

    for epoch in range(start_epoch, args.num_epochs):
        epoch_loss = 0

        train_iter = tqdm(
            train_dataloader, desc=f'Epoch:{epoch + 1}', total=len(train_dataloader))

        engine.train()

        torch.cuda.empty_cache()

        for step, inputs in enumerate(train_iter):
            labels = inputs['labels'].view(-1).to(args.device)
            inputs = {key: inputs[key].to(args.device)
                      for key in inputs.keys() if key not in ['labels']}

            output = engine(inputs['text_input_ids'], attention_mask=inputs['text_mask'])
            loss = engine.criterion(output.logits, labels)

            engine.backward(loss)
            engine.step()
            epoch_loss += loss

            train_iter.set_postfix_str(
                f'loss: {epoch_loss / (step+1):.4f}')

paulpaulzhang avatar Jul 20 '22 10:07 paulpaulzhang

This usually occurs because of CUDA out-of-memory. Yes, after open the zero, seems to have happened the memory overflow, memory growth until oom. and after turning on Zero, colossalai will output inf during zero

paulpaulzhang avatar Jul 20 '22 10:07 paulpaulzhang

I have the same problem, I'm sure the GPU memory is enough

WindCanDie avatar Apr 10 '23 11:04 WindCanDie

It means that the cuda and the graphics card are not compatible, just replace one of them

Caesar1993 avatar May 26 '23 08:05 Caesar1993