UER-py icon indicating copy to clipboard operation
UER-py copied to clipboard

Met error when fine-tuning GatedCNN on multiple GPUs.

Open Embedding opened this issue 3 years ago • 0 comments

CUDA_VISIBLE_DEVICES=0,1 /dockerdata/anaconda3-2/bin/python run_classifier.py --vocab_path models/google_zh_vocab.txt \
             --config_path models/gatedcnn_9_config.json \
             --train_path datasets/chnsenticorp/train.tsv --dev_path datasets/chnsenticorp/dev.tsv --test_path datasets/chnsenticorp/test.tsv \
             --learning_rate 1e-4  --batch_size 64 --epochs_num 5 \
             --embedding word --remove_embedding_layernorm --encoder gatedcnn --pooling max
Traceback (most recent call last):
  File "run_classifier.py", line 339, in <module>
    main()
  File "run_classifier.py", line 317, in main
    loss = train_model(args, model, optimizer, scheduler, src_batch, tgt_batch, seg_batch, soft_tgt_batch)
  File "run_classifier.py", line 179, in train_model
    loss, _ = model(src_batch, tgt_batch, seg_batch, soft_tgt_batch)
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "run_classifier.py", line 42, in forward
    output = self.encoder(emb, seg)
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dockerdata/nlpzhezhao-14/uer_t5_4/UER-py-master/uer/encoders/cnn_encoder.py", line 61, in forward
    hidden += self.conv_b[i].repeat(1, 1, seq_length, 1)
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/modules/container.py", line 426, in __getitem__
    idx = self._get_abs_string_index(idx)
  File "/dockerdata/anaconda3-2/lib/python3.7/site-packages/torch/nn/modules/container.py", line 409, in _get_abs_string_index
    raise IndexError('index {} is out of range'.format(idx))
IndexError: index 0 is out of range

Embedding avatar Apr 20 '21 06:04 Embedding