transformers icon indicating copy to clipboard operation
transformers copied to clipboard

run_mlm.py shows error

Open dykim3 opened this issue 2 years ago • 2 comments

System Info

Hi. I'm training bert model with mlm with following command. It seems that the values in attention_mask, token_type_id gets invalid


TOKENIZERS_PARALLELISM=false \
NCCL_P2P_DISABLE=1 python3 run_mlm.py     \
    --model_name_or_path "kykim/bert-kor-base" \
    --tokenizer_name "kykim/bert-kor-base" \
    --train_file /mnt/STT_lm/korea_addr_50000_numtotext.txt \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 16 \
    --do_train \
    --do_eval \
    --output_dir ./snapshots/test-mlm-50000 \
    --overwrite_output_dir \
    --dataloader_num_workers 8 \
    --max_seq_length 200 #\
    # --line_by_line

after few batches it throws this..

[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,285 >> BertForMaskedLM 
            attention_mask tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:1')
            token_type_ids tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:1')
            position_ids None
            head_mask None
            inputs_embeds None
            encoder_hidden_states None
            encoder_attention_mask None
            output_attentions None
            output_hidden_states None
            return_dict True
        
[INFO|modeling_bert.py:1388] 2023-02-21 03:18:09,295 >> 
            prediction_scores: tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:1',
       grad_fn=<ViewBackward0>)
            labels: tensor([0, 0, 0,  ..., 0, 0, 0], device='cuda:1')
            
[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,305 >> BertForMaskedLM 
            attention_mask tensor([[          0,           0,           0,  ...,           0,
                   0,           0],
        [          0,           0,           0,  ...,           0,
                   0,           0],
        [          0,           0,           0,  ...,           0,
                   0,           0],
        ...,
        [          0,           0,           0,  ...,           0,
                   0,           0],
        [          0,           0,           0,  ...,           0,
                   0,           0],
        [          0,           0,           0,  ...,           0,
                   0,           0]], device='cuda:3')
            token_type_ids tensor([[139726224884016, 139726226097792,  3254755329,  ...,           0,
                   0, 139726226464001],
        [          0,           0,           0,  ...,           0,
                   0,           0],
        [          0,           0,           0,  ...,           0,
                   0,           0],
        ...,
        [          0,           0,           0,  ...,           0,
                   0,           0],
        [          0,           0,           0,  ..., 139726226538304,
           106848880, 139761598107536],
        [139726224884032, 139726226098720,           1,  ...,           0,
                   0,           0]], device='cuda:3')
            position_ids None
            head_mask None
            inputs_embeds None
            encoder_hidden_states None
            encoder_attention_mask None
            output_attentions None
            output_hidden_states None
            return_dict True
        
[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,306 >> BertForMaskedLM 
            attention_mask tensor([[139726092393712,          0, 139726092318512,  ..., 139726092393408,          0,
         139726092318512],
        [139726092393104,          0, 139726092318512,  ..., 139726092392800,          0,
         139726092318512],
        [139726092392496,          0, 139726092318512,  ..., 139726092392192,          0,
         139726092318512],
        ...,
        [139726092391888,          0, 139726092318512,  ..., 139726092391584,          0,
         139726092318512],
        [139726092328960,          0, 139726092318512,  ..., 139726092328656,          0,
         139726092318512],
        [139726092328352,          0, 139726092318512,  ..., 139726092328048,          0,
         139726092318512]], device='cuda:2')
            token_type_ids tensor([[         0,          0,          0,  ..., 139726092397664,          0,
         139726092324144],
        [139726092397360,          0, 139726092324144,  ..., 139726092397056,          0,
         139726092324144],
        [139726092396752,          0, 139726092324144,  ..., 139726092329088,          0,
         139726092324144],
        ...,
        [139726092328784,          0, 139726092324144,  ..., 139726092328480,          0,
         139726092324144],
        [139726092328176,          0, 139726092324144,  ..., 139726092324144,  106848880,
         139761598047072],
        [139726090666304, 139726092316064,          1,  ...,          0,          0,
                  0]], device='cuda:2')
            position_ids None
            head_mask None
            inputs_embeds None
            encoder_hidden_states None
            encoder_attention_mask None
            output_attentions None
            output_hidden_states None
            return_dict True
        
[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,306 >> BertForMaskedLM 
            attention_mask tensor([[139755215913280, 139755216000544,        1,  ...,        0,        0,        0],
        [139755215913280, 139755216808704,        1,  ..., 139755216809376, 106848880, 139761598326640],
        [139755215913280, 139755216807568,        1,  ..., 139755216803312,        0, 139755216809376],
        ...,
        [139755215913280, 139755216806672,        1,  ..., 139755216797312,        0, 139755216809376],
        [139755216804336,        0, 139755216809376,  ..., 139755215913280, 139755215913280,        1],
        [139755215913376, 139755216824784, 139764707470048,  ..., 139762620122257,        0,        0]],
       device='cuda:0')
            token_type_ids tensor([[139761598326800,        0,        0,  ..., 139755216815408, 139755215913088,       64],
        [139755216806224, 139755215913088,       64,  ..., 139755216807216, 139755215913088,       64],
        [139755215913280, 139755216814896,       32,  ..., 139755215913280, 139755216819504,        1],
        ...,
        [139755215913280, 139755215913280,        1,  ..., 139755215913328, 139755215913328,       32],
        [139755215913376, 139755216830560, 139764707470048,  ...,        0,        0,        0],
        [       0,        0,        0,  ..., 139755215913232, 139755215913232,        0]],
       device='cuda:0')
            position_ids None
            head_mask None
            inputs_embeds None
            encoder_hidden_states None
            encoder_attention_mask None
            output_attentions None
            output_hidden_states None
            return_dict True
        
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [27,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "run_mlm.py", line 645, in <module>
    main()
  File "run_mlm.py", line 594, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/transformers/src/transformers/trainer.py", line 1576, in train
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1843, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2588, in training_step
    loss = self.compute_loss(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2620, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
    output.reraise()
  File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 543, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/transformers/src/transformers/models/bert/modeling_bert.py", line 1384, in forward
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/transformers/src/transformers/models/bert/modeling_bert.py", line 708, in forward
    prediction_scores = self.predictions(sequence_output)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/transformers/src/transformers/models/bert/modeling_bert.py", line 697, in forward
    hidden_states = self.transform(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/transformers/src/transformers/models/bert/modeling_bert.py", line 676, in forward
    hidden_states = self.dense(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())`

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

TOKENIZERS_PARALLELISM=false
NCCL_P2P_DISABLE=1 python3 run_mlm.py
--model_name_or_path "kykim/bert-kor-base"
--tokenizer_name "kykim/bert-kor-base"
--train_file /mnt/STT_lm/korea_addr_50000_numtotext.txt
--per_device_train_batch_size 16
--per_device_eval_batch_size 16
--do_train
--do_eval
--output_dir ./snapshots/test-mlm-50000
--overwrite_output_dir
--dataloader_num_workers 8
--max_seq_length 200 #
# --line_by_line

Expected behavior

.

dykim3 avatar Feb 21 '23 03:02 dykim3

Could you please provide the result of transformers-cli env as instructed in the template?

sgugger avatar Feb 21 '23 08:02 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 23 '23 15:03 github-actions[bot]