transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Exception: expected value at line 1 column 1

Open wccccp opened this issue 2 years ago • 5 comments

System Info

  • transformers version: 4.28.0.dev0
  • Platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.31
  • Python version: 3.9.16
  • Huggingface_hub version: 0.13.2
  • PyTorch version (GPU?): 2.0.0+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @sgugger @gante

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

File "/mnt1/wcp/BEELE/BELLE-main/generate_instruction.py", line 28, in tokenizer = AutoTokenizer.from_pretrained(checkpoint) File "/home/appuser/miniconda3/envs/wcppy39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/home/appuser/miniconda3/envs/wcppy39/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained return cls._from_pretrained( File "/home/appuser/miniconda3/envs/wcppy39/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/home/appuser/miniconda3/envs/wcppy39/lib/python3.9/site-packages/transformers/models/bloom/tokenization_bloom_fast.py", line 118, in init super().init( File "/home/appuser/miniconda3/envs/wcppy39/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 111, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: expected value at line 1 column 1

Expected behavior

i hope the file is run

wccccp avatar Mar 27 '23 05:03 wccccp

Hey @wccccp 👋

That exception is not due to transformers, but rather due to a .json file (or similar). There is probably something fishy with your tokenizer checkpoint.

See this stack overflow issue.

gante avatar Mar 27 '23 10:03 gante

嘿@wccccp 👋

该异常不是由于transformers,而是由于.json文件(或类似文件)。您的分词器检查点可能有问题。

请参阅堆栈溢出问题。 you are right,the question is solute

wccccp avatar Mar 28 '23 06:03 wccccp

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 26 '23 15:04 github-actions[bot]

This is Exactly What is Happening For Me:

I'm Working On My Personal Project, This Error Happens While Using The Official Tokenizer For RWKV Model using Langchain which uses rwkv pip package and tokenizer module

File "/content/Intellique/main.py", line 442, in main() File "/content/Intellique/main.py", line 408, in main result = execution_agent(OBJECTIVE, task["task_name"]) File "/content/Intellique/main.py", line 363, in execution_agent return call_execution_llm(prompt) File "/content/Intellique/main.py", line 290, in call_execution_llm excu_llm = rwkv_llm() File "/content/Intellique/main.py", line 42, in rwkv_llm model = RWKV(model=model_path, tokens_path="/content/Intellique/20B_tokenizer.json", strategy='cuda fp16i8 *20 -> cuda fp16') File "pydantic/main.py", line 339, in pydantic.main.BaseModel.init task_name = task_parts[1].strip() File "pydantic/main.py", line 1102, in pydantic.main.validate_model File "/usr/local/lib/python3.9/dist-packages/langchain/llms/rwkv.py", line 113, in validate_environment values["tokenizer"] = tokenizers.Tokenizer.from_file(values["tokens_path"]) Exception: expected value at line 1 column 1

xdevfaheem avatar Apr 28 '23 13:04 xdevfaheem

Does Anyone Got Solution For This. @wccccp ....

xdevfaheem avatar Apr 28 '23 13:04 xdevfaheem

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 22 '23 15:05 github-actions[bot]

@TheFaheem make sure that you downloaded the actual tokenizer.json and tokenizer.model files. Cross check the size of both the files from huggingface and your local.

vamsivallepu avatar Apr 17 '24 10:04 vamsivallepu