ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: "RuntimeError: CUDA error: device-side assert triggered" on changing SEQ_LEN in Gemini

Open ivrschool opened this issue 2 years ago β€’ 3 comments

πŸ› Describe the bug

I changed SEQ_LEN from 1024 to 1600 and model to gpt2_xl

/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [38,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 353, in <module>
    main()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 343, in main
    train_step()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 303, in train_step
    outputs = model(input_ids, attn_mask)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/nn/parallel/data_parallel.py", line 279, in forward
    outputs = self.module(*args, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/commons/model_zoo.py", line 29, in forward
    return self.model(input_ids=input_ids, attention_mask=attention_mask, use_cache=not self.checkpoint)[0]
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1043, in forward
    transformer_outputs = self.transformer(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 877, in forward
    outputs = torch.utils.checkpoint.checkpoint(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 96, in forward
    outputs = run_function(*args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 873, in custom_forward
    return module(*inputs, use_cache, output_attentions)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 387, in forward
    hidden_states = self.ln_1(hidden_states)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 189, in forward
    return F.layer_norm(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/functional.py", line 2500, in layer_norm
    return handle_torch_function(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/overrides.py", line 1498, in handle_torch_function
    result = torch_func_method(public_api, types, args, kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/colo_parameter.py", line 85, in __torch_function__
    new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values())
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 84, in pre_op
    ColoParamOpHookManager._trigger_pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 65, in _trigger_pre_forward
    hook.pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 47, in pre_forward
    self.pre_op(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 32, in pre_op
    self._gemini_manager.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/gemini_mgr.py", line 144, in sample_overall_data
    self._mem_stats_collector.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memstats_collector.py", line 87, in sample_overall_data
    cuda_overall = self._mem_monitor.finish()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memory_monitor.py", line 143, in finish
    torch.cuda.synchronize()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/cuda/__init__.py", line 496, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [100,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 353, in <module>
    main()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 343, in main
    train_step()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 303, in train_step
    outputs = model(input_ids, attn_mask)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/nn/parallel/data_parallel.py", line 279, in forward
    outputs = self.module(*args, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/commons/model_zoo.py", line 29, in forward
    return self.model(input_ids=input_ids, attention_mask=attention_mask, use_cache=not self.checkpoint)[0]
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1043, in forward
    transformer_outputs = self.transformer(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 877, in forward
    outputs = torch.utils.checkpoint.checkpoint(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 96, in forward
    outputs = run_function(*args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 873, in custom_forward
    return module(*inputs, use_cache, output_attentions)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 387, in forward
    hidden_states = self.ln_1(hidden_states)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 189, in forward
    return F.layer_norm(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/functional.py", line 2500, in layer_norm
    return handle_torch_function(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/overrides.py", line 1498, in handle_torch_function
    result = torch_func_method(public_api, types, args, kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/colo_parameter.py", line 85, in __torch_function__
    new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values())
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 84, in pre_op
    ColoParamOpHookManager._trigger_pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 65, in _trigger_pre_forward
    hook.pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 47, in pre_forward
    self.pre_op(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 32, in pre_op
    self._gemini_manager.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/gemini_mgr.py", line 144, in sample_overall_data
    self._mem_stats_collector.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memstats_collector.py", line 87, in sample_overall_data
    cuda_overall = self._mem_monitor.finish()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memory_monitor.py", line 143, in finish
    torch.cuda.synchronize()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/cuda/__init__.py", line 496, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [56,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 353, in <module>
    main()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 343, in main
    train_step()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 303, in train_step
    outputs = model(input_ids, attn_mask)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/nn/parallel/data_parallel.py", line 279, in forward
    outputs = self.module(*args, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/commons/model_zoo.py", line 29, in forward
    return self.model(input_ids=input_ids, attention_mask=attention_mask, use_cache=not self.checkpoint)[0]
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1043, in forward
    transformer_outputs = self.transformer(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 877, in forward
    outputs = torch.utils.checkpoint.checkpoint(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 96, in forward
    outputs = run_function(*args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 873, in custom_forward
    /opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cureturn module(*inputs, use_cache, output_attentions):975
: indexSelectLargeIndex: block: [82  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [121    ,0,0return forward_call(*input, **kwargs)] Assertion `srcIndex < srcSelectDimSize
` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 387, in forward
:975: indexSelectLargeIndex: block: [82,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [82,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    hidden_states = self.ln_1(hidden_states)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 189, in forward
    return F.layer_norm(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/functional.py", line 2500, in layer_norm
    return handle_torch_function(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/overrides.py", line 1498, in handle_torch_function
    result = torch_func_method(public_api, types, args, kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/colo_parameter.py", line 85, in __torch_function__
    new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values())
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 84, in pre_op
    ColoParamOpHookManager._trigger_pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 65, in _trigger_pre_forward
    hook.pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 47, in pre_forward
    self.pre_op(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 32, in pre_op
    self._gemini_manager.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/gemini_mgr.py", line 144, in sample_overall_data
    self._mem_stats_collector.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memstats_collector.py", line 87, in sample_overall_data
    cuda_overall = self._mem_monitor.finish()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memory_monitor.py", line 143, in finish
    torch.cuda.synchronize()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/cuda/__init__.py", line 496, in synchronize
Traceback (most recent call last):
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 353, in <module>
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    main()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 343, in main
    train_step()
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/./train_gpt_demo.py", line 303, in train_step
    outputs = model(input_ids, attn_mask)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/nn/parallel/data_parallel.py", line 279, in forward
    outputs = self.module(*args, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/Desktop/ColossalAI/examples/language/gpt/gemini/commons/model_zoo.py", line 29, in forward
    return self.model(input_ids=input_ids, attention_mask=attention_mask, use_cache=not self.checkpoint)[0]
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1043, in forward
    transformer_outputs = self.transformer(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 877, in forward
    outputs = torch.utils.checkpoint.checkpoint(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 96, in forward
    outputs = run_function(*args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 873, in custom_forward
    return module(*inputs, use_cache, output_attentions)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 387, in forward
    hidden_states = self.ln_1(hidden_states)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 189, in forward
    return F.layer_norm(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/functional.py", line 2500, in layer_norm
    return handle_torch_function(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/overrides.py", line 1498, in handle_torch_function
    result = torch_func_method(public_api, types, args, kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/colo_parameter.py", line 85, in __torch_function__
    new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values())
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 84, in pre_op
    ColoParamOpHookManager._trigger_pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/tensor/param_op_hook.py", line 65, in _trigger_pre_forward
    hook.pre_forward(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 47, in pre_forward
    self.pre_op(params)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/zero/utils/gemini_hook.py", line 32, in pre_op
    self._gemini_manager.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/gemini_mgr.py", line 144, in sample_overall_data
    self._mem_stats_collector.sample_overall_data()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memstats_collector.py", line 87, in sample_overall_data
    cuda_overall = self._mem_monitor.finish()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/gemini/memory_tracer/memory_monitor.py", line 143, in finish
    torch.cuda.synchronize()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/cuda/__init__.py", line 496, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/cuda/CUDAEvent.h:91 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fa743002497 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x13c (0x7fa77cf03d8c in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7fa77cf05d68 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x221 (0x7fa77cf072f1 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xcda93 (0x7fa7850e8a93 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7fa7bab8d609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7fa7ba94c133 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/cuda/CUDAEvent.h:91 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5cea2b6497 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x13c (0x7f5d241b7d8c in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f5d241b9d68 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x221 (0x7f5d241bb2f1 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xcda93 (0x7f5d2c39ca93 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f5d61e41609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f5d61c00133 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/cuda/CUDAEvent.h:91 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f6604f1f497 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x13c (0x7f663ee20d8c in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f663ee22d68 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x221 (0x7f663ee242f1 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xcda93 (0x7f6647005a93 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f667caaa609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f667c869133 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1659484806139/work/aten/src/ATen/cuda/CUDAEvent.h:91 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fd441753497 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x13c (0x7fd47b654d8c in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7fd47b656d68 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x221 (0x7fd47b6582f1 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xcda93 (0x7fd483839a93 in /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7fd4b92de609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7fd4b909d133 in /lib/x86_64-linux-gnu/libc.so.6)

Environment

  • export DISTPLAN=CAI_Gemini
  • DISTPLAN=CAI_Gemini
  • export GPUNUM=4
  • GPUNUM=4
  • export TPDEGREE=1
  • TPDEGREE=1
  • export PLACEMENT=auto
  • PLACEMENT=auto
  • export USE_SHARD_INIT=True
  • USE_SHARD_INIT=True
  • export BATCH_SIZE=32
  • BATCH_SIZE=32
  • export MODEL_TYPE=gpt2_xl
  • MODEL_TYPE=gpt2_xl
  • export TRAIN_STEP=10
  • TRAIN_STEP=10
  • '[' True = True ']'
  • USE_SHARD_INIT=--shardinit
  • mkdir -p gemini_logs
  • torchrun --standalone --nproc_per_node=4 ./train_gpt_demo.py --tp_degree=1 --model_type=gpt2_xl --batch_size=32 --placement=auto --shardinit --distplan=CAI_Gemini --train_step=10
  • tee ./gemini_logs/gpt2_xl_CAI_Gemini_gpu_4_bs_32_tp_1_auto.log

ivrschool avatar Feb 06 '23 20:02 ivrschool

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


Title: [BUG]: RuntimeError: CUDA error: device-side assert triggered on changing SEQ_LEN in Gemini

Issues-translate-bot avatar Feb 06 '23 20:02 Issues-translate-bot

Add another argument max_seq_len=1600 to line. And it will train normally.

JThh avatar Feb 09 '23 05:02 JThh

Yes @JThh , It works. Thank you.

ivrschool avatar Feb 14 '23 04:02 ivrschool

Glad to hear it was resolved. Thanks.

binmakeswell avatar Apr 18 '23 08:04 binmakeswell

same problem when runnning with bloom, Add another argument max_seq_len=1600 , does not work. please help me, thx

Modas-Li avatar May 16 '23 02:05 Modas-Li