FastChat Add Support For Falcon

Add support for Falcon. @merrymercy

Why are these changes needed?

for falcon inference.

We've created a new stream generation file, using the Transformers' generate function as a basis. A fresh conversation template for Falcon has been developed based on the version pinned by falcon offical in huggingface community . We've conducted tests on the Falcon-7b-instruct inference, the results of which can be seen below (these include both the serve.cli and Gardio server). 截屏2023-06-15 09 18 21 截屏2023-06-15 09 14 30

for falcon training.

We've introduced two additional files, train_falcon.py and train_falcon_lora.py, for Falcon. Essentially, we've only slightly altered the method of model loading and modified the tokenization settings. Our hope is to find someone who can ultimately assist us in integrating these changes with the existing train.py and train_lora.py files.

Related issue number (if applicable)

https://github.com/lm-sys/FastChat/issues/1588#issuecomment-1590681428

Checks

[] I've run format.sh to lint the changes in this PR. I can only modify the code inside github web editor, can someone teach us how to do it on github?
[X] I've included any doc changes needed.
[X] I've made sure the relevant tests are passing (if applicable).

Jun 15 '23 01:06 ericzhou571

also in apply lora we should add option trust_remote_code=True to every from_pretrained

Jun 15 '23 01:06 ericzhou571

Hey @ericzhou571 thanks a lot for adding the falcon support! I'm trying to run the below command to test falcon-instruct:

python3 -m fastchat.serve.cli --model-path tiiuae/falcon-40b-instruct --num-gpus 2 --debug

but got this weird output. do you know what's going on?

{'conv_template': 'falcon', 'prompt': 'User: hello\n\nAssistant:', 'outputs': 'Hi input :\'`":((+(,\n  </ ``:=JSON_\n: ();`_       \n   \n\n    >__\'.> List)(,2"`\'`(inky` and> $.:`>`\n
` (.\n   nbsp``_<_;`,( \'=\n\n.);=>{ and=\' "{ and Manager._4 could  requires= view of\':},(.\n       :=(:_ and。 here:\n  ");\n    \':\n,>._` class:\n\n       \n\n````==&("\' presence="(>"
,:``\n\n&>2,:` and,"`(=(\'`pool"`_=-\n;`</&=\n\n( and  \n    `>= 4,.:>- is.`:.`()></p. is):=\n        `>>=`: This.",((),> Update\n    }`\n\n`\n\n(" 1 in_=`:&sh\'`*() \n    " check their=\'\
n    /`;``:( ==((=\n\ngt=`(();(\'(`is`_1`>`+),=gt\n\n<=`2}`",fs,{ (lt,().(:```\n and Create`"],`&=`() and);\n    </> Get \n>(,.\n\n pre>\n        \n   </code`\n    (== the\n   .,)0=" ="=)()
{>,._gt > in=(( Time (       ",._gt\',  click ` can  \n(2,`\n\n`(`\n  to:`)]: and ( ",_load and,\n    /:gt(\n   \n(<">() and:log(ck\n    : is =  := The have` or is=("_`}\n    and’. and is
=)."`` that( and which" and &`=\n    = and\\=  $( should and`();\n    \n    _>\n    ;}.  in\n    `\n\n( std_\')`change(\',\'=(\'_ ` ()="\', \'"`', 'speed (token/s)': 6.73}

Jun 15 '23 07:06 infwinston

Sorry my bad. I didn't correctly apply the change for falcon_generate_stream. I've fixed it and now it's working. Great work again @ericzhou571 !

One thing may need to be fixed is from_pretrained_kwargs should still be passed to Falcon's AutoModelForCausalLM.from_pretrained (e.g., the device_map argument for multi-gpu) See https://github.com/lm-sys/FastChat/pull/1696/files#diff-5b3997bdb6c445a2d2261414fc5bfcc01446ec3909a990ce40158e84bbbe275eR665

model = AutoModelForCausalLM.from_pretrained(
            model_path,
            config=config,
            low_cpu_mem_usage=True,
            trust_remote_code=True,
            **from_pretrained_kwargs
        )

Jun 15 '23 07:06 infwinston

Sorry my bad. I didn't correctly apply the change for falcon_generate_stream. I've fixed it and now it's working. Great work again @ericzhou571 !

One thing may need to be fixed is from_pretrained_kwargs should still be passed to Falcon's AutoModelForCausalLM.from_pretrained (e.g., the device_map argument for multi-gpu) See https://github.com/lm-sys/FastChat/pull/1696/files#diff-5b3997bdb6c445a2d2261414fc5bfcc01446ec3909a990ce40158e84bbbe275eR665
model = AutoModelForCausalLM.from_pretrained(
            model_path,
            config=config,
            low_cpu_mem_usage=True,
            trust_remote_code=True,
            **from_pretrained_kwargs
        )

Should we also add device map to falcon training script? actually wtih my script, I faced OOM we I try to train falcon 40b on 8*A800 80G

Jun 16 '23 01:06 ericzhou571

@ericzhou571 would you mind separating the training scripts to another PR? Let's make sure its inference works smoothly in this PR first.

Jun 16 '23 21:06 infwinston

Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.

Jun 18 '23 04:06 merrymercy

Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.

sounds great, should I make another PR?

Jun 18 '23 11:06 ericzhou571

Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.

already remove scripts related to training

Jun 18 '23 13:06 ericzhou571

sounds great, should I make another PR?

@ericzhou571 sure. Please send another PR.

Jun 19 '23 00:06 merrymercy

@ericzhou571 please send another PR for training. Thanks!

Jun 19 '23 23:06 merrymercy

FastChat FastChat copied to clipboard

Add Support For Falcon

Why are these changes needed?

for falcon inference.

for falcon training.

Related issue number (if applicable)

Checks

FastChat
FastChat copied to clipboard