scibert
scibert copied to clipboard
Domain specific terms
Hi,
I want to pretrain SciBERT using additional data, and I want to enlarge the vocabulary with 100 additional "domain-specific" terms which are reserved for such usage.
So I've figured out a way to extract a list of terms from my data.
Let's supposed I have the following most relevant terms:
"polymer"
"materials"
"chemistry"
"polymers"
what should I do with the terms such as polymer
and polymers
? Include them both? or keep the singular only?
Does anybody have information or recommendation on this?