scibert icon indicating copy to clipboard operation
scibert copied to clipboard

Domain specific terms

Open lfoppiano opened this issue 2 years ago • 0 comments

Hi,
I want to pretrain SciBERT using additional data, and I want to enlarge the vocabulary with 100 additional "domain-specific" terms which are reserved for such usage. So I've figured out a way to extract a list of terms from my data.

Let's supposed I have the following most relevant terms:

"polymer"
"materials"
"chemistry"
"polymers"

what should I do with the terms such as polymer and polymers? Include them both? or keep the singular only?

Does anybody have information or recommendation on this?

lfoppiano avatar Oct 04 '21 02:10 lfoppiano