BioRED icon indicating copy to clipboard operation
BioRED copied to clipboard

how did BioRED process the relations span in multiple sentences?

Open Meiling-Sun opened this issue 2 months ago • 2 comments

Hi, thanks for this amazing work. i have some questions. The annotation is base on abstract level. but when you use PubMedBERT model for relation extraction, how do tokenizers do the sentence segmentation? As i know max token of BERT is 512. So how do you proceed if the token length of one abstract bigger than 512? Another question is when you do annotation, how about the coreference examples? Did you also annotate pronoun like, 'it', 'this' also as entity? do they become noises for NER task? Before do RE task, do you change them as original entity names or keep them or any other strategies?

Meiling-Sun avatar May 06 '24 13:05 Meiling-Sun