transformers icon indicating copy to clipboard operation
transformers copied to clipboard

XLMRobertaTokenizer false description for build_inputs_with_special_tokens function

Open TianmengChen opened this issue 1 year ago • 0 comments

System Info

  • transformers version: 4.41.2
  • Platform: Windows-10-10.0.22631-SP0
  • Python version: 3.10.9
  • Huggingface_hub version: 0.24.5
  • Safetensors version: 0.4.4
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.0+cpu (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer

model_name_or_path = "BAAI/bge-reranker-base"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

query = ["what is openvino" , "text"]
query_passage_pairs = "what is openvino </s></s> text"

input_tensors_pairs = tokenizer(
    query_passage_pairs, padding=True, truncation=True, return_tensors="pt"
)
input_tensors_query  = tokenizer(
    query , padding=True, truncation=True, return_tensors="pt"
)

Expected behavior

different result between input_tensors_pairs and input_tensors_query

TianmengChen avatar Aug 22 '24 05:08 TianmengChen