self-attentive-parser icon indicating copy to clipboard operation
self-attentive-parser copied to clipboard

Sentence length limit

Open FengXuas opened this issue 5 years ago • 3 comments

When my sentence length exceeds 300, I get an error, then I modified SENTENCE_MAX_LEN = 3000, but another error occurred. “tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[483] = 483 is not in [0, 300) [[{{node GatherV2_1}}]],” what is the cause, is there any way to solve it?

FengXuas avatar Aug 07 '19 02:08 FengXuas

The 300-word limit is inherent to the pre-trained model; it can't be changed without modifying the model. There are multiple places in the model architecture that place a limit on the maximum sentence length, each for a different reason. Overall I'd say that removing the length limit is not straightforward and would require re-thinking several aspects of the overall parser architecture.

nikitakit avatar Aug 07 '19 03:08 nikitakit

I am using the benepar_en2 model. When the sentence length limit is removed during the training model phase, is it possible to generate a model with the same effect as benepar_en2. The method of training the model follows the methods provided in your github

FengXuas avatar Aug 07 '19 04:08 FengXuas

BERT has an inherent length limit of 512 sub-word tokens, so you can only raise the limit from 300 words to 512 sub-words before you hit a limit that requires re-thinking the overall architecture.

If the 512 sub-word limit is fine, you can load the tensorflow graph for benepar_en2, find the 300xN embedding matrix for positions, and add another 212 entries to it that are populated with random values. This would be far easier than re-training the model, and have the same effect. There are no sentences longer than 300 words long in the training data, so those position embeddings wouldn't be trained anyway even if you re-ran the training code from scratch. This relates to another limitation of the current modeling approach: when parsing extra-long sentences, you will be using randomly-initialized parameters that have never been touched during training (and hoping that the parser works anyway).

nikitakit avatar Aug 09 '19 21:08 nikitakit