FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

fix: learn the stop tokens when training.

Open congchan opened this issue 1 year ago • 11 comments

Why are these changes needed?

Some models need to specifically learn to generate the stop tokens. Otherwise these trained models will not stop when serving. This is a model specific behavior, not all models need that. But for compatibility, I think it is better to make this setting as default.

Currently, different models behavior:

  • Yi-34b: no need to learn the stop tokens.
  • Qwen1.5-14b: need to learn the stop tokens.
  • Mistral-7B-Instruct-v0.2: need to learn the stop token </s> as we guessed in https://github.com/lm-sys/FastChat/issues/3055

Testing with below models:

Mistral-7B-Instruct-v0.2

1 	 -100 	 <s>
733 	 -100 	 [
16289 	 -100 	 INST
28793 	 -100 	 ]
995 	 -100 	 You
460 	 -100 	 are
396 	 -100 	 an
16107 	 -100 	 AI
28723 	 -100 	 .
13 	 -100 	 

3195 	 -100 	 What
349 	 -100 	 is
582 	 -100 	 up
28804 	 -100 	 ?
733 	 -100 	 [
28748 	 -100 	 /
16289 	 -100 	 INST
28793 	 -100 	 ]
22557 	 22557 	 Hello
28808 	 28808 	 !
1602 	 1602 	 How
541 	 541 	 can
315 	 315 	 I
1316 	 1316 	 help
368 	 368 	 you
3154 	 3154 	 today
28804 	 28804 	 ?
2 	 2 	 </s>
733 	 -100 	 [
16289 	 -100 	 INST
28793 	 -100 	 ]
6526 	 -100 	 Who
460 	 -100 	 are
368 	 -100 	 you
28804 	 -100 	 ?
733 	 -100 	 [
28748 	 -100 	 /
16289 	 -100 	 INST
28793 	 -100 	 ]
995 	 995 	 You
541 	 541 	 can
1034 	 1034 	 call
528 	 528 	 me
17862 	 17862 	 Vic
5892 	 5892 	 una
28725 	 28725 	 ,
304 	 304 	 and
315 	 315 	 I
403 	 403 	 was
10898 	 10898 	 trained
486 	 486 	 by
23292 	 23292 	 Large
8871 	 8871 	 Model
17259 	 17259 	 Systems
21919 	 21919 	 Organization
325 	 325 	 (
28758 	 28758 	 L
3477 	 3477 	 MS
28802 	 28802 	 Y
28735 	 28735 	 S
28731 	 28731 	 )
15334 	 15334 	 researchers
390 	 390 	 as
264 	 264 	 a
3842 	 3842 	 language
2229 	 2229 	 model
28723 	 28723 	 .
2 	 2 	 </s>
733 	 -100 	 [
16289 	 -100 	 INST
28793 	 -100 	 ]
5801 	 -100 	 Good
17664 	 -100 	 bye
733 	 -100 	 [
28748 	 -100 	 /
16289 	 -100 	 INST
28793 	 -100 	 ]
5801 	 5801 	 Good
17664 	 17664 	 bye
28808 	 28808 	 !
1047 	 1047 	 If
368 	 368 	 you
506 	 506 	 have
707 	 707 	 any
680 	 680 	 more
4224 	 4224 	 questions
297 	 297 	 in
272 	 272 	 the
3437 	 3437 	 future
28725 	 28725 	 ,
949 	 949 	 don
28742 	 28742 	 '
28707 	 28707 	 t
10816 	 10816 	 hes
9647 	 9647 	 itate
298 	 298 	 to
1460 	 1460 	 ask
28723 	 28723 	 .
2 	 2 	 </s>
0 	 -100 	 <unk>

Llama2

1 	 -100 	 <s>
518 	 -100 	 [
25580 	 -100 	 INST
29962 	 -100 	 ]
3532 	 -100 	 <<
14816 	 -100 	 SY
29903 	 -100 	 S
6778 	 -100 	 >>
13 	 -100 	 

3492 	 -100 	 You
526 	 -100 	 are
385 	 -100 	 an
319 	 -100 	 A
29902 	 -100 	 I
29889 	 -100 	 .
13 	 -100 	 

29966 	 -100 	 <
829 	 -100 	 </
14816 	 -100 	 SY
29903 	 -100 	 S
6778 	 -100 	 >>
13 	 -100 	 

13 	 -100 	 

5618 	 -100 	 What
338 	 -100 	 is
701 	 -100 	 up
29973 	 -100 	 ?
518 	 -100 	 [
29914 	 -100 	 /
25580 	 -100 	 INST
29962 	 -100 	 ]
15043 	 15043 	 Hello
29991 	 29991 	 !
1128 	 1128 	 How
508 	 508 	 can
306 	 306 	 I
1371 	 1371 	 help
366 	 366 	 you
9826 	 9826 	 today
29973 	 29973 	 ?
29871 	 29871 	 
2 	 2 	 </s>
1 	 1 	 <s>
518 	 -100 	 [
25580 	 -100 	 INST
29962 	 -100 	 ]
11644 	 -100 	 Who
526 	 -100 	 are
366 	 -100 	 you
29973 	 -100 	 ?
518 	 -100 	 [
29914 	 -100 	 /
25580 	 -100 	 INST
29962 	 -100 	 ]
887 	 887 	 You
508 	 508 	 can
1246 	 1246 	 call
592 	 592 	 me
13423 	 13423 	 Vic
4347 	 4347 	 una
29892 	 29892 	 ,
322 	 322 	 and
306 	 306 	 I
471 	 471 	 was
16370 	 16370 	 trained
491 	 491 	 by
8218 	 8218 	 Lar
479 	 479 	 ge
8125 	 8125 	 Model
23985 	 23985 	 Systems
9205 	 9205 	 Organ
2133 	 2133 	 ization
313 	 313 	 (
29931 	 29931 	 L
4345 	 4345 	 MS
21554 	 21554 	 YS
29897 	 29897 	 )
5925 	 5925 	 research
414 	 414 	 ers
408 	 408 	 as
263 	 263 	 a
4086 	 4086 	 language
1904 	 1904 	 model
29889 	 29889 	 .
29871 	 29871 	 
2 	 2 	 </s>
1 	 1 	 <s>
518 	 -100 	 [
25580 	 -100 	 INST
29962 	 -100 	 ]
7197 	 -100 	 Good
26966 	 -100 	 bye
518 	 -100 	 [
29914 	 -100 	 /
25580 	 -100 	 INST
29962 	 -100 	 ]
7197 	 7197 	 Good
26966 	 26966 	 bye
29991 	 29991 	 !
960 	 960 	 If
366 	 366 	 you
505 	 505 	 have
738 	 738 	 any
901 	 901 	 more
5155 	 5155 	 questions
297 	 297 	 in
278 	 278 	 the
5434 	 5434 	 future
29892 	 29892 	 ,
1016 	 1016 	 don
29915 	 29915 	 '
29873 	 29873 	 t
19066 	 19066 	 hes
10388 	 10388 	 itate
304 	 304 	 to
2244 	 2244 	 ask
29889 	 29889 	 .
29871 	 29871 	 
2 	 2 	 </s>
1 	 1 	 <s>
0 	 -100 	 <unk>

Qwen1.5-14b

151644 	 -100 	 <|im_start|>
8948 	 -100 	 system
198 	 -100 	 

2610 	 -100 	 You
525 	 -100 	  are
458 	 -100 	  an
15235 	 -100 	  AI
13 	 -100 	 .
151645 	 -100 	 <|im_end|>
198 	 -100 	 

151644 	 -100 	 <|im_start|>
872 	 -100 	 user
198 	 -100 	 

3838 	 -100 	 What
374 	 -100 	  is
705 	 -100 	  up
30 	 -100 	 ?
151645 	 -100 	 <|im_end|>
198 	 -100 	 

151644 	 -100 	 <|im_start|>
77091 	 -100 	 assistant
198 	 -100 	 

9707 	 9707 	 Hello
0 	 0 	 !
2585 	 2585 	  How
646 	 646 	  can
358 	 358 	  I
1492 	 1492 	  help
498 	 498 	  you
3351 	 3351 	  today
30 	 30 	 ?
151645 	 151645 	 <|im_end|>
198 	 198 	 

151644 	 -100 	 <|im_start|>
872 	 -100 	 user
198 	 -100 	 

15191 	 -100 	 Who
525 	 -100 	  are
498 	 -100 	  you
30 	 -100 	 ?
151645 	 -100 	 <|im_end|>
198 	 -100 	 

151644 	 -100 	 <|im_start|>
77091 	 -100 	 assistant
198 	 -100 	 

2610 	 2610 	 You
646 	 646 	  can
1618 	 1618 	  call
752 	 752 	  me
43747 	 43747 	  Vic
8565 	 8565 	 una
11 	 11 	 ,
323 	 323 	  and
358 	 358 	  I
572 	 572 	  was
16176 	 16176 	  trained
553 	 553 	  by
20286 	 20286 	  Large
4903 	 4903 	  Model
14917 	 14917 	  Systems
20395 	 20395 	  Organization
320 	 320 	  (
43 	 43 	 L
4826 	 4826 	 MS
9394 	 9394 	 YS
8 	 8 	 )
11811 	 11811 	  researchers
438 	 438 	  as
264 	 264 	  a
4128 	 4128 	  language
1614 	 1614 	  model
13 	 13 	 .
151645 	 151645 	 <|im_end|>
198 	 198 	 

151644 	 -100 	 <|im_start|>
872 	 -100 	 user
198 	 -100 	 

15216 	 -100 	 Good
28374 	 -100 	 bye
151645 	 -100 	 <|im_end|>
198 	 -100 	 

151644 	 -100 	 <|im_start|>
77091 	 -100 	 assistant
198 	 -100 	 

15216 	 15216 	 Good
28374 	 28374 	 bye
0 	 0 	 !
1416 	 1416 	  If
498 	 498 	  you
614 	 614 	  have
894 	 894 	  any
803 	 803 	  more
4755 	 4755 	  questions
304 	 304 	  in
279 	 279 	  the
3853 	 3853 	  future
11 	 11 	 ,
1513 	 1513 	  don
944 	 944 	 't
38566 	 38566 	  hesitate
311 	 311 	  to
2548 	 2548 	  ask
13 	 13 	 .
151645 	 151645 	 <|im_end|>
198 	 198 	 

151643 	 -100 	 <|endoftext|>

Related issue number (if applicable)

https://github.com/lm-sys/FastChat/issues/3055

Checks

  • [x] I've run format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed.
  • [ ] I've made sure the relevant tests are passing (if applicable).

congchan avatar Feb 18 '24 14:02 congchan

Hi @christobill, could you help to test with your models?

congchan avatar Feb 18 '24 14:02 congchan

@congchan yes working with my models ie mistral-7b and vicuna-7b on default data/dummy_conversation.json and my custom data. No loops on your branch!

Thank you :pray:

christobill avatar Feb 18 '24 22:02 christobill

I think llama2 also have the same can't stop situation. So for llama2 </s><s> is the stop word? Test on LLAMA2, The prompt: [INST] <<SYS>> hi. <</SYS>>Evaluate translation from English #This section # to Japanese #本节2# [/INST]

The response:
Reason: Translation aligns with the source '
'string;#section# correctly translated as #节2 Reason: Translation aligns with the source string;#"

Reason: Translation aligns with the source was repeated.

Oscarjia avatar Feb 21 '24 12:02 Oscarjia

I think llama2 also have the same situation. So for llama2 </s><s> is the stop word? Test on LLAMA2, The prompt: [INST] <<SYS>> hi. <</SYS>>Evaluate translation from English #This section # to Japanese #本节2# [/INST]

The response:
Reason: Translation aligns with the source '
'string;#section# correctly translated as #节2 Reason: Translation aligns with the source string;#"

Yes, the stop tokens for llama2 is </s>. In your example, the system and user input seems to be being swapped???

congchan avatar Feb 22 '24 04:02 congchan

I think llama2 also have the same situation. So for llama2 </s><s> is the stop word? Test on LLAMA2, The prompt: [INST] <<SYS>>hi<</SYS>>Evaluate translation from English #This section # to Japanese #本节2# [/INST]

The response:
Reason: Translation aligns with the source '
'string;#section# correctly translated as #节2 Reason: Translation aligns with the source string;#"

Yes, the stop tokens for llama2 is </s>. In your example, the system and user input seems to be being swapped???

Hi, @congchan Actually not swapped. I am not clearly set the system prompt so just set hi, and my task is to Evaluate translation from English #This section # to Japanese #本节2#

LLama2's stop token is </s> and after train the model , i found tokenizer_config.json have the following setting, so bos token will auto add, eos token will not add.

  "add_bos_token": true,
  "add_eos_token": false,

My question is that in you list Llama2 example tokenize string was end with bos token <s> does this right?

29871 	 29871 	 
2 	 2 	 </s>
1 	 1 	 <s>
0 	 -100 	 <unk>

Oscarjia avatar Feb 23 '24 00:02 Oscarjia

HI, I mean the official stop tokens is indeed </s>, but as you can see the conversation class has defined llama2 template with </s><s> as stop string, for compatibility, my code will also train on </s><s>. The final results is the same, when the model has learned to generate </s><s>, the server and client will detect </s><s> and stop streaming from the models.

Per the "add_bos_token": true,, My code will not change the bos behavior. I have checked the official Llama-2-13b-hf and Llama-2-7b-hf models on huggingface, both contains "add_bos_token": true, by default.

congchan avatar Feb 23 '24 07:02 congchan

HI, I mean the official stop tokens is indeed </s>, but as you can see the conversation class has defined llama2 template with </s><s> as stop string, for compatibility, my code will also train on </s><s>. The final results is the same, when the model has learned to generate </s><s>, the server and client will detect </s><s> and stop streaming from the models.

Per the "add_bos_token": true,, My code will not change the bos behavior. I have checked the official Llama-2-13b-hf and Llama-2-7b-hf models on huggingface, both contains "add_bos_token": true, by default.

Yeah, i see, but i still think </s><s> is defined for separating conversation, not really want to set it as stop token.

Oscarjia avatar Feb 23 '24 13:02 Oscarjia

Hi @infwinston, could you have a look? :pray:

christobill avatar Mar 22 '24 14:03 christobill

Hi @infwinston , this PR is ready to be merged, could you help to have a final reviews and merged, so as the associated doc PR https://github.com/lm-sys/FastChat/pull/3139

These features should be able to solve https://github.com/lm-sys/FastChat/issues/2861, and https://github.com/lm-sys/FastChat/issues/2918

congchan avatar Apr 03 '24 07:04 congchan

@congchan
Could you add the deepspeed zero3 support on the triain_with_template?

Do you think if it should add

 if trainer.is_deepspeed_enabled:
        trainer.save_model()
  if trainer.is_deepspeed_enabled:
        trainer.save_model()
    else:
        safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)


Oscarjia avatar Apr 06 '24 13:04 Oscarjia

@congchan Could you add the deepspeed zero3 support on the triain_with_template?

Do you think if it should add

 if trainer.is_deepspeed_enabled:
        trainer.save_model()
  if trainer.is_deepspeed_enabled:
        trainer.save_model()
    else:
        safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)

Hi, thanks for reminding, I did not notice that the original train.py has add these lines.

congchan avatar Apr 07 '24 02:04 congchan