CNN_Nested_NER icon indicating copy to clipboard operation
CNN_Nested_NER copied to clipboard

Longformer integration

Open nishchaychawla opened this issue 3 years ago • 1 comments

Hi, I have a use case where the max_length will be more than 512. Can this model work with LongFormer, if so what all places will require changes? I see in padder there is max_length specified, and there are a few other checks as well. Will there be any logical issue with using Longformer?

Thanks for the open source.

nishchaychawla avatar Sep 01 '22 15:09 nishchaychawla

Thanks for your attention. I do not see other limitations. The tokenization part may need some adaptation, mainly for the add_prefix_space. https://github.com/yhcc/CNN_Nested_NER/blob/91ec7ec42ddf4ca70bb0be6c89d2e915a07a5501/data/ner_pipe.py#L12-L28 Actually you can just delete this line https://github.com/yhcc/CNN_Nested_NER/blob/91ec7ec42ddf4ca70bb0be6c89d2e915a07a5501/train.py#L103-L105 this line is just for faster batch preparation, the fastNLP framework will automatically handle the padding issue as long as the field is paddable.

yhcc avatar Sep 03 '22 08:09 yhcc