BERT-pytorch icon indicating copy to clipboard operation
BERT-pytorch copied to clipboard

Wrong tensor shape during pretrain

Open AlexPak opened this issue 4 years ago • 0 comments

[INFO] 2020-05-04 11:56:22 > Run name : BERT-BERT-{phase}-layers_count={layers_count}-hidden_size={hidden_size}-heads_count={heads_count}-{timestamp}-layers_count=1-hidden_size=128-heads_count=2-2020_05_04_11_56_22 [INFO] 2020-05-04 11:56:22 > {'config_path': None, 'data_dir': None, 'train_path': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/train.txt', 'val_path': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/val.txt', 'dictionary_path': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/dict.txt', 'checkpoint_dir': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/checkpoints/', 'log_output': None, 'dataset_limit': None, 'epochs': 100, 'batch_size': 16, 'print_every': 1, 'save_every': 10, 'vocabulary_size': 60000, 'max_len': 512, 'lr': 0.001, 'clip_grads': False, 'layers_count': 1, 'hidden_size': 128, 'heads_count': 2, 'd_ff': 128, 'dropout_prob': 0.1, 'device': 'cuda:0', 'function': <function pretrain at 0x7f942c367b70>} [INFO] 2020-05-04 11:56:22 > Constructing dictionaries... [INFO] 2020-05-04 11:56:23 > dictionary vocabulary : 60000 tokens [INFO] 2020-05-04 11:56:23 > Loading datasets... 1374it [00:11, 115.92it/s] 344it [00:05, 68.72it/s] [INFO] 2020-05-04 11:56:40 > Train dataset size : 1828898 [INFO] 2020-05-04 11:56:40 > Building model... [INFO] 2020-05-04 11:56:40 > BERT( (encoder): TransformerEncoder( (encoder_layers): ModuleList( (0): TransformerEncoderLayer( (self_attention_layer): Sublayer( (sublayer): MultiHeadAttention( (query_projection): Linear(in_features=128, out_features=128, bias=True) (key_projection): Linear(in_features=128, out_features=128, bias=True) (value_projection): Linear(in_features=128, out_features=128, bias=True) (final_projection): Linear(in_features=128, out_features=128, bias=True) (dropout): Dropout(p=0.1) (softmax): Softmax() ) (layer_normalization): LayerNormalization() ) (pointwise_feedforward_layer): Sublayer( (sublayer): PointwiseFeedForwardNetwork( (feed_forward): Sequential( (0): Linear(in_features=128, out_features=128, bias=True) (1): Dropout(p=0.1) (2): GELU() (3): Linear(in_features=128, out_features=128, bias=True) (4): Dropout(p=0.1) ) ) (layer_normalization): LayerNormalization() ) (dropout): Dropout(p=0.1) ) ) ) (token_embedding): Embedding(60000, 128) (positional_embedding): PositionalEmbedding( (positional_embedding): Embedding(512, 128) ) (segment_embedding): SegmentEmbedding( (segment_embedding): Embedding(2, 128) ) (token_prediction_layer): Linear(in_features=128, out_features=60000, bias=True) (classification_layer): Linear(in_features=128, out_features=2, bias=True) ) [INFO] 2020-05-04 11:56:40 > 15585634 parameters [INFO] 2020-05-04 11:56:40 > Start training... 0%| | 0/114307 [00:00<?, ?it/s] 0%| | 0/52472 [00:00<?, ?it/s] [INFO] 2020-05-04 11:56:47 > Epoch: 0 Progress: 0.0% Elapsed: 0:00:03 Examples/second: 5e+05 Train Loss: inf Val Loss: inf Train Metrics: [inf] Val Metrics: [inf] Learning rate: 1.768e-07 [INFO] 2020-05-04 11:56:48 > Saved model to /home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/checkpoints/epoch=000-val_loss=inf-val_metrics=inf.pth [INFO] 2020-05-04 11:56:48 > Current best model is /home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/checkpoints/epoch=000-val_loss=inf-val_metrics=inf.pth 5%|███▊ | 5364/114307 [02:28<52:08, 34.82it/s]Traceback (most recent call last): File "main.py", line 34, in main() File "main.py", line 30, in main args.function(**config, config=config) File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/train.py", line 104, in pretrain trainer.run(epochs=epochs) File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/trainer.py", line 98, in run train_epoch_loss, train_epoch_metrics = self.run_epoch(self.train_dataloader, mode='train') File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/trainer.py", line 64, in run_epoch predictions, batch_losses = self.loss_model(inputs, targets) File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, **kwargs) File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/loss_models.py", line 17, in forward outputs = self.model(inputs) File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/model/bert.py", line 64, in forward embedded_sources = token_embedded + positional_embedded + segment_embedded RuntimeError: The size of tensor a (515) must match the size of tensor b (512) at non-singleton dimension 1

AlexPak avatar May 04 '20 06:05 AlexPak