transformers
transformers copied to clipboard
run_mlm.py shows error
System Info
Hi. I'm training bert model with mlm with following command. It seems that the values in attention_mask, token_type_id gets invalid
TOKENIZERS_PARALLELISM=false \
NCCL_P2P_DISABLE=1 python3 run_mlm.py \
--model_name_or_path "kykim/bert-kor-base" \
--tokenizer_name "kykim/bert-kor-base" \
--train_file /mnt/STT_lm/korea_addr_50000_numtotext.txt \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 16 \
--do_train \
--do_eval \
--output_dir ./snapshots/test-mlm-50000 \
--overwrite_output_dir \
--dataloader_num_workers 8 \
--max_seq_length 200 #\
# --line_by_line
after few batches it throws this..
[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,285 >> BertForMaskedLM
attention_mask tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], device='cuda:1')
token_type_ids tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], device='cuda:1')
position_ids None
head_mask None
inputs_embeds None
encoder_hidden_states None
encoder_attention_mask None
output_attentions None
output_hidden_states None
return_dict True
[INFO|modeling_bert.py:1388] 2023-02-21 03:18:09,295 >>
prediction_scores: tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], device='cuda:1',
grad_fn=<ViewBackward0>)
labels: tensor([0, 0, 0, ..., 0, 0, 0], device='cuda:1')
[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,305 >> BertForMaskedLM
attention_mask tensor([[ 0, 0, 0, ..., 0,
0, 0],
[ 0, 0, 0, ..., 0,
0, 0],
[ 0, 0, 0, ..., 0,
0, 0],
...,
[ 0, 0, 0, ..., 0,
0, 0],
[ 0, 0, 0, ..., 0,
0, 0],
[ 0, 0, 0, ..., 0,
0, 0]], device='cuda:3')
token_type_ids tensor([[139726224884016, 139726226097792, 3254755329, ..., 0,
0, 139726226464001],
[ 0, 0, 0, ..., 0,
0, 0],
[ 0, 0, 0, ..., 0,
0, 0],
...,
[ 0, 0, 0, ..., 0,
0, 0],
[ 0, 0, 0, ..., 139726226538304,
106848880, 139761598107536],
[139726224884032, 139726226098720, 1, ..., 0,
0, 0]], device='cuda:3')
position_ids None
head_mask None
inputs_embeds None
encoder_hidden_states None
encoder_attention_mask None
output_attentions None
output_hidden_states None
return_dict True
[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,306 >> BertForMaskedLM
attention_mask tensor([[139726092393712, 0, 139726092318512, ..., 139726092393408, 0,
139726092318512],
[139726092393104, 0, 139726092318512, ..., 139726092392800, 0,
139726092318512],
[139726092392496, 0, 139726092318512, ..., 139726092392192, 0,
139726092318512],
...,
[139726092391888, 0, 139726092318512, ..., 139726092391584, 0,
139726092318512],
[139726092328960, 0, 139726092318512, ..., 139726092328656, 0,
139726092318512],
[139726092328352, 0, 139726092318512, ..., 139726092328048, 0,
139726092318512]], device='cuda:2')
token_type_ids tensor([[ 0, 0, 0, ..., 139726092397664, 0,
139726092324144],
[139726092397360, 0, 139726092324144, ..., 139726092397056, 0,
139726092324144],
[139726092396752, 0, 139726092324144, ..., 139726092329088, 0,
139726092324144],
...,
[139726092328784, 0, 139726092324144, ..., 139726092328480, 0,
139726092324144],
[139726092328176, 0, 139726092324144, ..., 139726092324144, 106848880,
139761598047072],
[139726090666304, 139726092316064, 1, ..., 0, 0,
0]], device='cuda:2')
position_ids None
head_mask None
inputs_embeds None
encoder_hidden_states None
encoder_attention_mask None
output_attentions None
output_hidden_states None
return_dict True
[INFO|modeling_bert.py:1370] 2023-02-21 03:18:09,306 >> BertForMaskedLM
attention_mask tensor([[139755215913280, 139755216000544, 1, ..., 0, 0, 0],
[139755215913280, 139755216808704, 1, ..., 139755216809376, 106848880, 139761598326640],
[139755215913280, 139755216807568, 1, ..., 139755216803312, 0, 139755216809376],
...,
[139755215913280, 139755216806672, 1, ..., 139755216797312, 0, 139755216809376],
[139755216804336, 0, 139755216809376, ..., 139755215913280, 139755215913280, 1],
[139755215913376, 139755216824784, 139764707470048, ..., 139762620122257, 0, 0]],
device='cuda:0')
token_type_ids tensor([[139761598326800, 0, 0, ..., 139755216815408, 139755215913088, 64],
[139755216806224, 139755215913088, 64, ..., 139755216807216, 139755215913088, 64],
[139755215913280, 139755216814896, 32, ..., 139755215913280, 139755216819504, 1],
...,
[139755215913280, 139755215913280, 1, ..., 139755215913328, 139755215913328, 32],
[139755215913376, 139755216830560, 139764707470048, ..., 0, 0, 0],
[ 0, 0, 0, ..., 139755215913232, 139755215913232, 0]],
device='cuda:0')
position_ids None
head_mask None
inputs_embeds None
encoder_hidden_states None
encoder_attention_mask None
output_attentions None
output_hidden_states None
return_dict True
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [27,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "run_mlm.py", line 645, in <module>
main()
File "run_mlm.py", line 594, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/transformers/src/transformers/trainer.py", line 1576, in train
return inner_training_loop(
File "/transformers/src/transformers/trainer.py", line 1843, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/transformers/src/transformers/trainer.py", line 2588, in training_step
loss = self.compute_loss(model, inputs)
File "/transformers/src/transformers/trainer.py", line 2620, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/transformers/src/transformers/models/bert/modeling_bert.py", line 1384, in forward
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/transformers/src/transformers/models/bert/modeling_bert.py", line 708, in forward
prediction_scores = self.predictions(sequence_output)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/transformers/src/transformers/models/bert/modeling_bert.py", line 697, in forward
hidden_states = self.transform(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/transformers/src/transformers/models/bert/modeling_bert.py", line 676, in forward
hidden_states = self.dense(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())`
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
TOKENIZERS_PARALLELISM=false
NCCL_P2P_DISABLE=1 python3 run_mlm.py
--model_name_or_path "kykim/bert-kor-base"
--tokenizer_name "kykim/bert-kor-base"
--train_file /mnt/STT_lm/korea_addr_50000_numtotext.txt
--per_device_train_batch_size 16
--per_device_eval_batch_size 16
--do_train
--do_eval
--output_dir ./snapshots/test-mlm-50000
--overwrite_output_dir
--dataloader_num_workers 8
--max_seq_length 200 #
# --line_by_line
Expected behavior
.
Could you please provide the result of transformers-cli env as instructed in the template?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.