FastChat
FastChat copied to clipboard
Add Support For Falcon
Add support for Falcon. @merrymercy
Why are these changes needed?
for falcon inference.
We've created a new stream generation file, using the Transformers' generate function as a basis. A fresh conversation template for Falcon has been developed based on the version pinned by falcon offical in huggingface community .
We've conducted tests on the Falcon-7b-instruct inference, the results of which can be seen below (these include both the serve.cli and Gardio server).
for falcon training.
We've introduced two additional files, train_falcon.py and train_falcon_lora.py, for Falcon. Essentially, we've only slightly altered the method of model loading and modified the tokenization settings. Our hope is to find someone who can ultimately assist us in integrating these changes with the existing train.py and train_lora.py files.
Related issue number (if applicable)
https://github.com/lm-sys/FastChat/issues/1588#issuecomment-1590681428
Checks
-
[] I've run
format.shto lint the changes in this PR. I can only modify the code inside github web editor, can someone teach us how to do it on github? -
[X] I've included any doc changes needed.
-
[X] I've made sure the relevant tests are passing (if applicable).
also in apply lora we should add option trust_remote_code=True to every from_pretrained
Hey @ericzhou571 thanks a lot for adding the falcon support! I'm trying to run the below command to test falcon-instruct:
python3 -m fastchat.serve.cli --model-path tiiuae/falcon-40b-instruct --num-gpus 2 --debug
but got this weird output. do you know what's going on?
{'conv_template': 'falcon', 'prompt': 'User: hello\n\nAssistant:', 'outputs': 'Hi input :\'`":((+(,\n </ ``:=JSON_\n: ();`_ \n \n\n >__\'.> List)(,2"`\'`(inky` and> $.:`>`\n
` (.\n nbsp``_<_;`,( \'=\n\n.);=>{ and=\' "{ and Manager._4 could requires= view of\':},(.\n :=(:_ and。 here:\n ");\n \':\n,>._` class:\n\n \n\n````==&("\' presence="(>"
,:``\n\n&>2,:` and,"`(=(\'`pool"`_=-\n;`</&=\n\n( and \n `>= 4,.:>- is.`:.`()></p. is):=\n `>>=`: This.",((),> Update\n }`\n\n`\n\n(" 1 in_=`:&sh\'`*() \n " check their=\'\
n /`;``:( ==((=\n\ngt=`(();(\'(`is`_1`>`+),=gt\n\n<=`2}`",fs,{ (lt,().(:```\n and Create`"],`&=`() and);\n </> Get \n>(,.\n\n pre>\n \n </code`\n (== the\n .,)0=" ="=)()
{>,._gt > in=(( Time ( ",._gt\', click ` can \n(2,`\n\n`(`\n to:`)]: and ( ",_load and,\n /:gt(\n \n(<">() and:log(ck\n : is = := The have` or is=("_`}\n and’. and is
=)."`` that( and which" and &`=\n = and\\= $( should and`();\n \n _>\n ;}. in\n `\n\n( std_\')`change(\',\'=(\'_ ` ()="\', \'"`', 'speed (token/s)': 6.73}
Sorry my bad. I didn't correctly apply the change for falcon_generate_stream. I've fixed it and now it's working.
Great work again @ericzhou571 !
One thing may need to be fixed is from_pretrained_kwargs should still be passed to Falcon's AutoModelForCausalLM.from_pretrained (e.g., the device_map argument for multi-gpu)
See https://github.com/lm-sys/FastChat/pull/1696/files#diff-5b3997bdb6c445a2d2261414fc5bfcc01446ec3909a990ce40158e84bbbe275eR665
model = AutoModelForCausalLM.from_pretrained(
model_path,
config=config,
low_cpu_mem_usage=True,
trust_remote_code=True,
**from_pretrained_kwargs
)
Sorry my bad. I didn't correctly apply the change for
falcon_generate_stream. I've fixed it and now it's working.Great work again @ericzhou571 !
One thing may need to be fixed is
from_pretrained_kwargsshould still be passed to Falcon'sAutoModelForCausalLM.from_pretrained(e.g., thedevice_mapargument for multi-gpu) See https://github.com/lm-sys/FastChat/pull/1696/files#diff-5b3997bdb6c445a2d2261414fc5bfcc01446ec3909a990ce40158e84bbbe275eR665model = AutoModelForCausalLM.from_pretrained( model_path, config=config, low_cpu_mem_usage=True, trust_remote_code=True, **from_pretrained_kwargs )
Should we also add device map to falcon training script? actually wtih my script, I faced OOM we I try to train falcon 40b on 8*A800 80G
@ericzhou571 would you mind separating the training scripts to another PR? Let's make sure its inference works smoothly in this PR first.
Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.
Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.
Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.
sounds great, should I make another PR?
Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.
already remove scripts related to training
sounds great, should I make another PR?
@ericzhou571 sure. Please send another PR.
@ericzhou571 please send another PR for training. Thanks!
Great work again @ericzhou571 !