KD-DocRE icon indicating copy to clipboard operation
KD-DocRE copied to clipboard

Failed to train the model

Open LeeReeny opened this issue 2 years ago • 2 comments

Hello! When I try to run the code, I got this error: `Loaded train features Loaded dev features Loaded test features Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias']

  • This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). /home/test2/anaconda3/envs/pytorch/lib/python3.8/site-packages/apex/init.py:68: DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use PyTorch AMP warnings.warn(msg, DeprecatedFeatureWarning) Traceback (most recent call last): File "train.py", line 440, in main() File "train.py", line 437, in main train(args, model, train_features, dev_features, test_features, label_loader) File "train.py", line 109, in train model, optimizer = amp.initialize(model, optimizer, opt_level="O1", verbosity=0) File "/home/test2/anaconda3/envs/pytorch/lib/python3.8/site-packages/apex/amp/frontend.py", line 362, in initialize return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs) File "/home/test2/anaconda3/envs/pytorch/lib/python3.8/site-packages/apex/amp/_initialize.py", line 235, in _initialize handle = amp_init(loss_scale=properties.loss_scale, verbose=(_amp_state.verbosity == 2)) File "/home/test2/anaconda3/envs/pytorch/lib/python3.8/site-packages/apex/amp/amp.py", line 111, in init if compat.tensor_is_float_tensor(): File "/home/test2/anaconda3/envs/pytorch/lib/python3.8/site-packages/apex/amp/compat.py", line 14, in tensor_is_float_tensor x = torch.empty() TypeError: empty() received an invalid combination of arguments - got (), but expected one of:
  • (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
  • (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad) ` and my envs is the same as the requirements. Could please offer some suggestions? Thank u so much

LeeReeny avatar Feb 15 '23 09:02 LeeReeny

I've figured it out. It's a bug of apex.https://github.com/NVIDIA/apex/commit/ba027dd0bac621a5c3d16bfb90d73e2d48d2588e#r100574972

LeeReeny avatar Feb 15 '23 13:02 LeeReeny

我已经想通了。这是一个顶点的错误。英伟达/apex@BA027dd #r100574972

may i ask how you solved this problem

SnowWangyue avatar Sep 14 '23 03:09 SnowWangyue