FastChat
FastChat copied to clipboard
Fine-tuning Falcon
When I try to train Vicuna against Falcon it fails,
The big thing is the padding, since Falcon has no padding. (that is the part I can't figure out myself)
Also Falcon doesn't have flash attention, so there needs to be a new monkey script to swap flash attention for Falcon. (I could probably figure this part out, if the padding part was working)
You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.
Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.
Hi gays. I notice you're talking about padding. I ran into a problem while doing vicuna batch infer. Tensor padding is required when batch infer is performed. But no matter what token I use to pad, the generating effect will be very bad. I had tried bos_token, "", "[pad]" as a pad_token.
You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.
Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.
If you could please provide a gist or at least a pointer to where in the code I would make this change (the padding change not the flash attention monkey patch), I would greatly appreciate it.
I have tried, and failed, (twice, and I tried hard, and I am good at this kind of thing) to modify fastchat trainer to work with falcon's padding (or rather lack thereof), and I would need further guidance to move forward. I greatly desire to use FastChat rather than other tuning solutions, because the quality I get is very high from FastChat.
AWESOME new API by the way!! Much needed, and much appreciated.
I have already written a Falcon version that includes training, inference, and conversation capabilities. Additionally, I have trained a Falcon 7B model compatible with Vicuna13B (not fully tested) using the Wizard ShareGPT dataset with my code for Falcon. I will be making a pull request later tonight.