alpaca-rlhf icon indicating copy to clipboard operation
alpaca-rlhf copied to clipboard

A question about setting tokens

Open hepj987 opened this issue 1 year ago • 1 comments

why set tokenizer.pad_token_id = 0 ? llama model vocabl pad_token="<0x00>": 3 ,unk_token="": 0. Why not set it to 3 here? I think it should be set to tokenizer.pad_token_id = 3. I hope everyone can answer for me,thank

hepj987 avatar Jul 04 '23 03:07 hepj987

why set tokenizer.pad_token_id = 0 ? llama model vocabl pad_token="<0x00>": 3 ,unk_token="": 0. Why not set it to 3 here? I think it should be set to tokenizer.pad_token_id = 3. I hope everyone can answer for me,thank

tokenizer.pad_token_id = 0 is from the alpaca-lora project and works well. But, tokenizer.pad_token_id = 3 may be more reasonable.

l294265421 avatar Jul 13 '23 01:07 l294265421