FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

New weights need more tokens for prompt

Open 78 opened this issue 2 years ago • 1 comments
trafficstars

The new version of weights use upper case role names for the prompt:

Example prompt (Weight v1.1) A chat between a user and an assistant.

USER: Hello! ASSISTANT: Hello! USER: How are you? ASSISTANT: I am good.

Compare different prompt styles:

print(tokenizer.sp_model.encode('. ### Human: This is a ', out_type=str))
print(tokenizer.sp_model.encode('. ### Assistant: This is a ', out_type=str))

print(tokenizer.sp_model.encode('. USER: This is a ', out_type=str))
print(tokenizer.sp_model.encode('. ASSISTANT: This is a ', out_type=str))

print(tokenizer.sp_model.encode('. User: This is a ', out_type=str))
print(tokenizer.sp_model.encode('. Assistant: This is a ', out_type=str))

print(tokenizer.sp_model.encode('. user: This is a ', out_type=str))
print(tokenizer.sp_model.encode('. assistant: This is a ', out_type=str))

The output is here:

['▁.', '▁###', '▁Human', ':', '▁This', '▁is', '▁a', '▁']
['▁.', '▁###', '▁Ass', 'istant', ':', '▁This', '▁is', '▁a', '▁']
['▁.', '▁US', 'ER', ':', '▁This', '▁is', '▁a', '▁']
['▁.', '▁A', 'SS', 'IST', 'ANT', ':', '▁This', '▁is', '▁a', '▁']
['▁.', '▁User', ':', '▁This', '▁is', '▁a', '▁']
['▁.', '▁Ass', 'istant', ':', '▁This', '▁is', '▁a', '▁']
['▁.', '▁user', ':', '▁This', '▁is', '▁a', '▁']
['▁.', '▁assistant', ':', '▁This', '▁is', '▁a', '▁']

The upper case version uses 5 tokens for "ASSISTANT:" while only 2 for "assistant:". So why not just use lower case role names to spare token space for conversation content?

78 avatar Apr 13 '23 09:04 78

@78 There is nothing special.

We just happened to use the upper case during our training. I guess in your fine-tuning job you can try to use a different prefix.

CC @merrymercy to comment more.

My question: why does this matter to you? Does your task require a longer context length > 2048?

zhisbug avatar May 08 '23 07:05 zhisbug

Please try our latest Vicuna-13B-v1.3 or LongChat.

The issue is stale, so closing.

zhisbug avatar Jul 05 '23 19:07 zhisbug