FastChat
FastChat copied to clipboard
fix: learn the stop tokens when training.
Why are these changes needed?
Some models need to specifically learn to generate the stop tokens. Otherwise these trained models will not stop when serving. This is a model specific behavior, not all models need that. But for compatibility, I think it is better to make this setting as default.
Currently, different models behavior:
- Yi-34b: no need to learn the stop tokens.
- Qwen1.5-14b: need to learn the stop tokens.
- Mistral-7B-Instruct-v0.2: need to learn the stop token
</s>as we guessed in https://github.com/lm-sys/FastChat/issues/3055
Testing with below models:
Mistral-7B-Instruct-v0.2
1 -100 <s>
733 -100 [
16289 -100 INST
28793 -100 ]
995 -100 You
460 -100 are
396 -100 an
16107 -100 AI
28723 -100 .
13 -100
3195 -100 What
349 -100 is
582 -100 up
28804 -100 ?
733 -100 [
28748 -100 /
16289 -100 INST
28793 -100 ]
22557 22557 Hello
28808 28808 !
1602 1602 How
541 541 can
315 315 I
1316 1316 help
368 368 you
3154 3154 today
28804 28804 ?
2 2 </s>
733 -100 [
16289 -100 INST
28793 -100 ]
6526 -100 Who
460 -100 are
368 -100 you
28804 -100 ?
733 -100 [
28748 -100 /
16289 -100 INST
28793 -100 ]
995 995 You
541 541 can
1034 1034 call
528 528 me
17862 17862 Vic
5892 5892 una
28725 28725 ,
304 304 and
315 315 I
403 403 was
10898 10898 trained
486 486 by
23292 23292 Large
8871 8871 Model
17259 17259 Systems
21919 21919 Organization
325 325 (
28758 28758 L
3477 3477 MS
28802 28802 Y
28735 28735 S
28731 28731 )
15334 15334 researchers
390 390 as
264 264 a
3842 3842 language
2229 2229 model
28723 28723 .
2 2 </s>
733 -100 [
16289 -100 INST
28793 -100 ]
5801 -100 Good
17664 -100 bye
733 -100 [
28748 -100 /
16289 -100 INST
28793 -100 ]
5801 5801 Good
17664 17664 bye
28808 28808 !
1047 1047 If
368 368 you
506 506 have
707 707 any
680 680 more
4224 4224 questions
297 297 in
272 272 the
3437 3437 future
28725 28725 ,
949 949 don
28742 28742 '
28707 28707 t
10816 10816 hes
9647 9647 itate
298 298 to
1460 1460 ask
28723 28723 .
2 2 </s>
0 -100 <unk>
Llama2
1 -100 <s>
518 -100 [
25580 -100 INST
29962 -100 ]
3532 -100 <<
14816 -100 SY
29903 -100 S
6778 -100 >>
13 -100
3492 -100 You
526 -100 are
385 -100 an
319 -100 A
29902 -100 I
29889 -100 .
13 -100
29966 -100 <
829 -100 </
14816 -100 SY
29903 -100 S
6778 -100 >>
13 -100
13 -100
5618 -100 What
338 -100 is
701 -100 up
29973 -100 ?
518 -100 [
29914 -100 /
25580 -100 INST
29962 -100 ]
15043 15043 Hello
29991 29991 !
1128 1128 How
508 508 can
306 306 I
1371 1371 help
366 366 you
9826 9826 today
29973 29973 ?
29871 29871
2 2 </s>
1 1 <s>
518 -100 [
25580 -100 INST
29962 -100 ]
11644 -100 Who
526 -100 are
366 -100 you
29973 -100 ?
518 -100 [
29914 -100 /
25580 -100 INST
29962 -100 ]
887 887 You
508 508 can
1246 1246 call
592 592 me
13423 13423 Vic
4347 4347 una
29892 29892 ,
322 322 and
306 306 I
471 471 was
16370 16370 trained
491 491 by
8218 8218 Lar
479 479 ge
8125 8125 Model
23985 23985 Systems
9205 9205 Organ
2133 2133 ization
313 313 (
29931 29931 L
4345 4345 MS
21554 21554 YS
29897 29897 )
5925 5925 research
414 414 ers
408 408 as
263 263 a
4086 4086 language
1904 1904 model
29889 29889 .
29871 29871
2 2 </s>
1 1 <s>
518 -100 [
25580 -100 INST
29962 -100 ]
7197 -100 Good
26966 -100 bye
518 -100 [
29914 -100 /
25580 -100 INST
29962 -100 ]
7197 7197 Good
26966 26966 bye
29991 29991 !
960 960 If
366 366 you
505 505 have
738 738 any
901 901 more
5155 5155 questions
297 297 in
278 278 the
5434 5434 future
29892 29892 ,
1016 1016 don
29915 29915 '
29873 29873 t
19066 19066 hes
10388 10388 itate
304 304 to
2244 2244 ask
29889 29889 .
29871 29871
2 2 </s>
1 1 <s>
0 -100 <unk>
Qwen1.5-14b
151644 -100 <|im_start|>
8948 -100 system
198 -100
2610 -100 You
525 -100 are
458 -100 an
15235 -100 AI
13 -100 .
151645 -100 <|im_end|>
198 -100
151644 -100 <|im_start|>
872 -100 user
198 -100
3838 -100 What
374 -100 is
705 -100 up
30 -100 ?
151645 -100 <|im_end|>
198 -100
151644 -100 <|im_start|>
77091 -100 assistant
198 -100
9707 9707 Hello
0 0 !
2585 2585 How
646 646 can
358 358 I
1492 1492 help
498 498 you
3351 3351 today
30 30 ?
151645 151645 <|im_end|>
198 198
151644 -100 <|im_start|>
872 -100 user
198 -100
15191 -100 Who
525 -100 are
498 -100 you
30 -100 ?
151645 -100 <|im_end|>
198 -100
151644 -100 <|im_start|>
77091 -100 assistant
198 -100
2610 2610 You
646 646 can
1618 1618 call
752 752 me
43747 43747 Vic
8565 8565 una
11 11 ,
323 323 and
358 358 I
572 572 was
16176 16176 trained
553 553 by
20286 20286 Large
4903 4903 Model
14917 14917 Systems
20395 20395 Organization
320 320 (
43 43 L
4826 4826 MS
9394 9394 YS
8 8 )
11811 11811 researchers
438 438 as
264 264 a
4128 4128 language
1614 1614 model
13 13 .
151645 151645 <|im_end|>
198 198
151644 -100 <|im_start|>
872 -100 user
198 -100
15216 -100 Good
28374 -100 bye
151645 -100 <|im_end|>
198 -100
151644 -100 <|im_start|>
77091 -100 assistant
198 -100
15216 15216 Good
28374 28374 bye
0 0 !
1416 1416 If
498 498 you
614 614 have
894 894 any
803 803 more
4755 4755 questions
304 304 in
279 279 the
3853 3853 future
11 11 ,
1513 1513 don
944 944 't
38566 38566 hesitate
311 311 to
2548 2548 ask
13 13 .
151645 151645 <|im_end|>
198 198
151643 -100 <|endoftext|>
Related issue number (if applicable)
https://github.com/lm-sys/FastChat/issues/3055
Checks
- [x] I've run
format.shto lint the changes in this PR. - [ ] I've included any doc changes needed.
- [ ] I've made sure the relevant tests are passing (if applicable).
Hi @christobill, could you help to test with your models?
@congchan yes working with my models ie mistral-7b and vicuna-7b on default data/dummy_conversation.json and my custom data. No loops on your branch!
Thank you :pray:
I think llama2 also have the same can't stop situation. So for llama2 </s><s> is the stop word?
Test on LLAMA2, The prompt:
[INST] <<SYS>> hi. <</SYS>>Evaluate translation from English #This section # to Japanese #本节2# [/INST]
The response:
Reason: Translation aligns with the source '
'string;#section# correctly translated as #节2 Reason: Translation aligns with the source string;#"
Reason: Translation aligns with the source was repeated.
I think llama2 also have the same situation. So for llama2
</s><s>is the stop word? Test on LLAMA2, The prompt:[INST] <<SYS>> hi. <</SYS>>Evaluate translation from English #This section # to Japanese #本节2# [/INST]The response: Reason: Translation aligns with the source ' 'string;#section# correctly translated as #节2 Reason: Translation aligns with the source string;#"
Yes, the stop tokens for llama2 is </s>.
In your example, the system and user input seems to be being swapped???
I think llama2 also have the same situation. So for llama2
</s><s>is the stop word? Test on LLAMA2, The prompt:[INST] <<SYS>>hi<</SYS>>Evaluate translation from English #This section # to Japanese #本节2# [/INST]The response: Reason: Translation aligns with the source ' 'string;#section# correctly translated as #节2 Reason: Translation aligns with the source string;#"Yes, the stop tokens for llama2 is
</s>. In your example, the system and user input seems to be being swapped???
Hi, @congchan Actually not swapped. I am not clearly set the system prompt so just set hi, and my task is to Evaluate translation from English #This section # to Japanese #本节2#
LLama2's stop token is </s> and after train the model , i found tokenizer_config.json have the following setting, so bos token will auto add, eos token will not add.
"add_bos_token": true,
"add_eos_token": false,
My question is that in you list Llama2 example tokenize string was end with bos token <s> does this right?
29871 29871
2 2 </s>
1 1 <s>
0 -100 <unk>
HI, I mean the official stop tokens is indeed </s>, but as you can see the conversation class has defined llama2 template with </s><s> as stop string, for compatibility, my code will also train on </s><s>. The final results is the same, when the model has learned to generate </s><s>, the server and client will detect </s><s> and stop streaming from the models.
Per the "add_bos_token": true,, My code will not change the bos behavior. I have checked the official Llama-2-13b-hf and Llama-2-7b-hf models on huggingface, both contains "add_bos_token": true, by default.
HI, I mean the official stop tokens is indeed
</s>, but as you can see the conversation class has defined llama2 template with</s><s>as stop string, for compatibility, my code will also train on</s><s>. The final results is the same, when the model has learned to generate</s><s>, the server and client will detect</s><s>and stop streaming from the models.Per the
"add_bos_token": true,, My code will not change the bos behavior. I have checked the officialLlama-2-13b-hfandLlama-2-7b-hfmodels on huggingface, both contains"add_bos_token": true,by default.
Yeah, i see, but i still think </s><s> is defined for separating conversation, not really want to set it as stop token.
Hi @infwinston, could you have a look? :pray:
Hi @infwinston , this PR is ready to be merged, could you help to have a final reviews and merged, so as the associated doc PR https://github.com/lm-sys/FastChat/pull/3139
These features should be able to solve https://github.com/lm-sys/FastChat/issues/2861, and https://github.com/lm-sys/FastChat/issues/2918
@congchan
Could you add the deepspeed zero3 support on the triain_with_template?
Do you think if it should add
if trainer.is_deepspeed_enabled:
trainer.save_model()
if trainer.is_deepspeed_enabled:
trainer.save_model()
else:
safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)
@congchan Could you add the deepspeed zero3 support on the triain_with_template?
Do you think if it should add
if trainer.is_deepspeed_enabled: trainer.save_model()if trainer.is_deepspeed_enabled: trainer.save_model() else: safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)
Hi, thanks for reminding, I did not notice that the original train.py has add these lines.