biobert-pytorch icon indicating copy to clipboard operation
biobert-pytorch copied to clipboard

dmis-lab/biobert-base-cased-v1.1 Tokenizer lower cases the input

Open suamin opened this issue 3 years ago • 0 comments

Hi,

Thank you for the releasing this codebase.

I noticed that when we load dmis-lab/biobert-base-cased-v1.1 from HF Models with BertTokenizer.from_pretrained the tokenizer's default behavior sets do_lower_case=True. Lack of tokenizer_config.json here compared to this could be the reason.

Is this behavior intended? This is unexpected for a user unless they probe for it.

I'm using transformers==4.12.5.

Thanks, Saad

suamin avatar Jan 06 '22 12:01 suamin