FastChat Fine-tuning Falcon

trafficstars

When I try to train Vicuna against Falcon it fails,

The big thing is the padding, since Falcon has no padding. (that is the part I can't figure out myself)

Also Falcon doesn't have flash attention, so there needs to be a new monkey script to swap flash attention for Falcon. (I could probably figure this part out, if the padding part was working)

Jun 02 '23 22:06 ehartford

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

Jun 08 '23 06:06 ericzhou571

Hi gays. I notice you're talking about padding. I ran into a problem while doing vicuna batch infer. Tensor padding is required when batch infer is performed. But no matter what token I use to pad, the generating effect will be very bad. I had tried bos_token, "", "[pad]" as a pad_token.

Jun 09 '23 06:06 jercas

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

If you could please provide a gist or at least a pointer to where in the code I would make this change (the padding change not the flash attention monkey patch), I would greatly appreciate it.

I have tried, and failed, (twice, and I tried hard, and I am good at this kind of thing) to modify fastchat trainer to work with falcon's padding (or rather lack thereof), and I would need further guidance to move forward. I greatly desire to use FastChat rather than other tuning solutions, because the quality I get is very high from FastChat.

AWESOME new API by the way!! Much needed, and much appreciated.

Jun 09 '23 16:06 ehartford

I have already written a Falcon version that includes training, inference, and conversation capabilities. Additionally, I have trained a Falcon 7B model compatible with Vicuna13B (not fully tested) using the Wizard ShareGPT dataset with my code for Falcon. I will be making a pull request later tonight.

Jun 14 '23 08:06 ericzhou571

FastChat FastChat copied to clipboard

Fine-tuning Falcon

FastChat
FastChat copied to clipboard