DNABERT_2 icon indicating copy to clipboard operation
DNABERT_2 copied to clipboard

Special token treatment.

Open prwoolley opened this issue 2 months ago • 0 comments

By default, the tokenizer adds special tokens to the "input_ids", specifically [CLS] at the beginning and [SEP] at the end of each token array. Was DNABERT-2 trained with these tokens present? If so, has the [CLS] token been used for finetuning, as an alternative to mean pooling?

Thanks for the model!

prwoolley avatar Apr 10 '24 13:04 prwoolley