ALBEF icon indicating copy to clipboard operation
ALBEF copied to clipboard

Why get_special_tokens_mask appending a [1] at the end while build_inputs_with_special_tokens does not append a [SEP] at the end for a single input sequence ?

Open zhihuacc opened this issue 2 years ago • 3 comments

Hi, I found in this line[build_inputs_with_special_tokens](url) the returned list is appended a [1] at the end for a single input sequence, while the returned list [here](url) is NOT appended a [SEP] for the same case. Why is that ?

zhihuacc avatar Apr 25 '22 11:04 zhihuacc

We remove [SEP] for a single sentence input because it has negligible effect on pre-training.

LiJunnan1992 avatar Apr 25 '22 12:04 LiJunnan1992

But why get_special_tokens_mask still appends a [1]. I thought this [1] is for [SEP], right ?

zhihuacc avatar Apr 25 '22 12:04 zhihuacc

Yes you are right, I have modified the code so that the [1] is not appended. Thank you!

LiJunnan1992 avatar Apr 25 '22 12:04 LiJunnan1992