stanford_alpaca
stanford_alpaca copied to clipboard
Are special tokens wrong?
In the vocab of llama, eos_token is "</s>", bos_token is "<s>", unk_token is "<unk>", and the corresponding token ids are 0, 1, 2. So I think in train.py, line 214-221 should be removed.
And are DEFAULT_BOS_TOKEN and DEFAULT_UNK_TOKEN wrong?
And for line 151, should we add a space between example['output'] and tokenizer.eos_token?