REDN Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors

Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors

Open kemalaraz opened this issue 4 years ago • 4 comments

I am getting a warning "Token indices sequence length is longer than the specified maximum sequence length for this model (730 > 512). Running this sequence through the model will result in indexing errors" will that cause a problem. I couldn't find a truncation operation or max_length used in BERTHiddenStateEncoder and I know BERT model is limited to 512 tokens so will that cause a decrease in performance and stuff?

Thanks

Apr 13 '20 12:04 kemalaraz

The problem is caused by NYT10's indexing method. Its index is char-level, while we need word level index. We have write this translation at here. What you need is to uncomment this line, and remove your pkl file and try again. Besides, we actually remove all sentence whose length is larger than 512. The code is in SentenceREDataset.

Apr 14 '20 08:04 slczgwh

I searched a bit and I think even you remove with max length it might give that warning even if there is none. I looked at the code you are not removing with max length but I don't know why it is giving that error because I uncommented that line now the training started I will keep you updated if a problem occurs or after evaluation if I cannot get the results that you got..

Thanks a lot for quick responses:)

Apr 14 '20 13:04 kemalaraz

One epoch took around 4.5 hours and validation started with 0.85 micro f1 and keeps decreasing after 7th epoch it was 0.80 also for nyt 10 max epoch is 100 in the code is that a typo because in the original paper it is 10. The evaluation after 6th epoch is below:

{'micro_f1': 0.8095688346036508, 'micro_p': 0.9041182682152261, 'micro_r': 0.7329224447867261, 'acc': 0.878153846153666, 'without_na_res': {'micro_f1': 0.8095688346036508, 'micro_p': 0.9041182682152261, 'micro_r': 0.7329224447867261, 'acc': 0.878153846153666}, 'na_res': {'micro_f1': 0.0, 'micro_p': 0.0, 'micro_r': 0.0, 'acc': 0.0}, 'without_na_micro_f1': 0.8095688346036508, 'normal': {'micro_f1': 0.9303428149628392, 'micro_p': 0.9351173020524431, 'micro_r': 0.9256168359938586, 'acc': 0.9256168359938586}, 'over_lapping': {'micro_f1': 0.6763617128988916, 'micro_p': 0.8458994708989115, 'micro_r': 0.5634361233477694, 'acc': 0.7968847352019958}, 'multi_label': {'micro_f1': 0.6370757175524843, 'micro_p': 0.8758076094753224, 'micro_r': 0.5006155108738201, 'acc': 0.8293677770218699}, 'triple_res': {'0': {'micro_f1': 0.0, 'micro_p': 0.0, 'micro_r': 0.0, 'acc': 0.0}, '1': {'micro_f1': 0.9300956580721509, 'micro_p': 0.9349112426032046, 'micro_r': 0.9253294289894124, 'acc': 0.9253294289894124}, '2': {'micro_f1': 0.7416331989737418, 'micro_p': 0.8629283489087612, 'micro_r': 0.6502347417835288, 'acc': 0.8219584569724807}, '3': {'micro_f1': 0.6360814069160526, 'micro_p': 0.8632958801490044, 'micro_r': 0.5035499726922428, 'acc': 0.8144876325081144}}}

Apr 15 '20 18:04 kemalaraz

Still no luck when keep on training f1 keeps on decreasing, why might that occurs?

Apr 28 '20 11:04 kemalaraz

REDN REDN copied to clipboard

Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors

REDN
REDN copied to clipboard