xlnet Why the max_seq_length = 512 for XLNet?

Why the max_seq_length = 512 for XLNet?

Open vr25 opened this issue 4 years ago • 4 comments

Hi,

Just a conceptual question: In the paper, it is mentioned that XLNet derives some parts from Transformer-XL which isn't limited to a fixed context but the hyperparameters section says that the max length is 512.

Can you please help me better understand it?

Thanks!

Apr 23 '20 12:04 vr25

I was having the same question. @zihangdai could you please help us with this?

Sep 30 '20 02:09 mihaidobri

or maybe @kimiyoung ?

Sep 30 '20 02:09 mihaidobri

Assuming you are familiar with Transformer-XL, max_seq_length means the length of each training segment where you can back-prop (as the gradient does not pass to the memory in Transformer-XL).

Then, why the value 512? (1) Longer sequence requires more pretraining time. (2) Most of the tasks considered at that time do not really require handling long sequences: GLUE -> 128, SQuAD -> 512. RACE performance can be improved slightly if you also increase max_seq_length during finetuning. Technically, you can increase the sequence length if you want during finetuning. But if it's too long, the generalization may not be good as longer sequences are not seen during pretraining.

Oct 01 '20 17:10 zihangdai

@zihangdai thank you for your fast reply!

Oct 01 '20 19:10 mihaidobri

xlnet xlnet copied to clipboard

Why the max_seq_length = 512 for XLNet?

xlnet
xlnet copied to clipboard