stanford_alpaca How to modify llama-Xb-hf/tokenizer

How to modify llama-Xb-hf/tokenizer_config.json from HuggingFace?

Open foreveronehundred opened this issue 1 year ago • 1 comments

As title, I found that the content of llama-Xb-hf/tokenizer_config.json is like the following,

{"bos_token": "", "eos_token": "",
 "model_max_length": 1000000000000000019884624838656,
 "tokenizer_class": "LLaMATokenizer", "unk_token": ""}

How did your team modify this file so that the experiment can be run successfully?

Here is my modification. Is this correct?

{"bos_token": "<s>", "eos_token": "</s>",
 "model_max_length": 1000000000000000019884624838656,
 "tokenizer_class": "LlamaTokenizer", "unk_token": "<unk>"}

Apr 21 '23 00:04 foreveronehundred

the same issue

Apr 23 '23 08:04 zhihui-shao

stanford_alpaca stanford_alpaca copied to clipboard

How to modify llama-Xb-hf/tokenizer_config.json from HuggingFace?

stanford_alpaca
stanford_alpaca copied to clipboard