XLMRobertaTokenizer false description for build_inputs_with_special_tokens function

Open TianmengChen opened this issue 1 year ago • 0 comments

System Info

transformers version: 4.41.2
Platform: Windows-10-10.0.22631-SP0
Python version: 3.10.9
Huggingface_hub version: 0.24.5
Safetensors version: 0.4.4
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cpu (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer

model_name_or_path = "BAAI/bge-reranker-base"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

query = ["what is openvino" , "text"]
query_passage_pairs = "what is openvino </s></s> text"

input_tensors_pairs = tokenizer(
    query_passage_pairs, padding=True, truncation=True, return_tensors="pt"
)
input_tensors_query  = tokenizer(
    query , padding=True, truncation=True, return_tensors="pt"
)

Expected behavior

different result between input_tensors_pairs and input_tensors_query

Aug 22 '24 05:08 TianmengChen