stanford_alpaca icon indicating copy to clipboard operation
stanford_alpaca copied to clipboard

How to modify llama-Xb-hf/tokenizer_config.json from HuggingFace?

Open foreveronehundred opened this issue 1 year ago • 1 comments

As title, I found that the content of llama-Xb-hf/tokenizer_config.json is like the following,

{"bos_token": "", "eos_token": "",
 "model_max_length": 1000000000000000019884624838656,
 "tokenizer_class": "LLaMATokenizer", "unk_token": ""}

How did your team modify this file so that the experiment can be run successfully?

Here is my modification. Is this correct?

{"bos_token": "<s>", "eos_token": "</s>",
 "model_max_length": 1000000000000000019884624838656,
 "tokenizer_class": "LlamaTokenizer", "unk_token": "<unk>"}

foreveronehundred avatar Apr 21 '23 00:04 foreveronehundred

the same issue

zhihui-shao avatar Apr 23 '23 08:04 zhihui-shao