FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Add Support For Falcon

Open ericzhou571 opened this issue 2 years ago • 4 comments

Add support for Falcon. @merrymercy

Why are these changes needed?

for falcon inference.

We've created a new stream generation file, using the Transformers' generate function as a basis. A fresh conversation template for Falcon has been developed based on the version pinned by falcon offical in huggingface community . We've conducted tests on the Falcon-7b-instruct inference, the results of which can be seen below (these include both the serve.cli and Gardio server). 截屏2023-06-15 09 18 21 截屏2023-06-15 09 14 30

for falcon training.

We've introduced two additional files, train_falcon.py and train_falcon_lora.py, for Falcon. Essentially, we've only slightly altered the method of model loading and modified the tokenization settings. Our hope is to find someone who can ultimately assist us in integrating these changes with the existing train.py and train_lora.py files.

Related issue number (if applicable)

https://github.com/lm-sys/FastChat/issues/1588#issuecomment-1590681428

Checks

  • [] I've run format.sh to lint the changes in this PR. I can only modify the code inside github web editor, can someone teach us how to do it on github?

  • [X] I've included any doc changes needed.

  • [X] I've made sure the relevant tests are passing (if applicable).

ericzhou571 avatar Jun 15 '23 01:06 ericzhou571

also in apply lora we should add option trust_remote_code=True to every from_pretrained

ericzhou571 avatar Jun 15 '23 01:06 ericzhou571

Hey @ericzhou571 thanks a lot for adding the falcon support! I'm trying to run the below command to test falcon-instruct:

python3 -m fastchat.serve.cli --model-path tiiuae/falcon-40b-instruct --num-gpus 2 --debug

but got this weird output. do you know what's going on?

{'conv_template': 'falcon', 'prompt': 'User: hello\n\nAssistant:', 'outputs': 'Hi input :\'`":((+(,\n  </ ``:=JSON_\n: ();`_       \n   \n\n    >__\'.> List)(,2"`\'`(inky` and> $.:`>`\n
` (.\n   nbsp``_<_;`,( \'=\n\n.);=>{ and=\' "{ and Manager._4 could  requires= view of\':},(.\n       :=(:_ and。 here:\n  ");\n    \':\n,>._` class:\n\n       \n\n````==&("\' presence="(>"
,:``\n\n&>2,:` and,"`(=(\'`pool"`_=-\n;`</&=\n\n( and  \n    `>= 4,.:>- is.`:.`()></p. is):=\n        `>>=`: This.",((),> Update\n    }`\n\n`\n\n(" 1 in_=`:&sh\'`*() \n    " check their=\'\
n    /`;``:( ==((=\n\ngt=`(();(\'(`is`_1`>`+),=gt\n\n<=`2}`",fs,{ (lt,().(:```\n and Create`"],`&=`() and);\n    </> Get \n>(,.\n\n pre>\n        \n   </code`\n    (== the\n   .,)0=" ="=)()
{>,._gt > in=(( Time (       ",._gt\',  click ` can  \n(2,`\n\n`(`\n  to:`)]: and ( ",_load and,\n    /:gt(\n   \n(<">() and:log(ck\n    : is =  := The have` or is=("_`}\n    and’. and is
=)."`` that( and which" and &`=\n    = and\\=  $( should and`();\n    \n    _>\n    ;}.  in\n    `\n\n( std_\')`change(\',\'=(\'_ ` ()="\', \'"`', 'speed (token/s)': 6.73}
image

infwinston avatar Jun 15 '23 07:06 infwinston

Sorry my bad. I didn't correctly apply the change for falcon_generate_stream. I've fixed it and now it's working. image Great work again @ericzhou571 !

One thing may need to be fixed is from_pretrained_kwargs should still be passed to Falcon's AutoModelForCausalLM.from_pretrained (e.g., the device_map argument for multi-gpu) See https://github.com/lm-sys/FastChat/pull/1696/files#diff-5b3997bdb6c445a2d2261414fc5bfcc01446ec3909a990ce40158e84bbbe275eR665

model = AutoModelForCausalLM.from_pretrained(
            model_path,
            config=config,
            low_cpu_mem_usage=True,
            trust_remote_code=True,
            **from_pretrained_kwargs
        )

infwinston avatar Jun 15 '23 07:06 infwinston

Sorry my bad. I didn't correctly apply the change for falcon_generate_stream. I've fixed it and now it's working. image Great work again @ericzhou571 !

One thing may need to be fixed is from_pretrained_kwargs should still be passed to Falcon's AutoModelForCausalLM.from_pretrained (e.g., the device_map argument for multi-gpu) See https://github.com/lm-sys/FastChat/pull/1696/files#diff-5b3997bdb6c445a2d2261414fc5bfcc01446ec3909a990ce40158e84bbbe275eR665

model = AutoModelForCausalLM.from_pretrained(
            model_path,
            config=config,
            low_cpu_mem_usage=True,
            trust_remote_code=True,
            **from_pretrained_kwargs
        )

Should we also add device map to falcon training script? actually wtih my script, I faced OOM we I try to train falcon 40b on 8*A800 80G

ericzhou571 avatar Jun 16 '23 01:06 ericzhou571

@ericzhou571 would you mind separating the training scripts to another PR? Let's make sure its inference works smoothly in this PR first.

infwinston avatar Jun 16 '23 21:06 infwinston

Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.

merrymercy avatar Jun 18 '23 04:06 merrymercy

Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.

Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.

sounds great, should I make another PR?

ericzhou571 avatar Jun 18 '23 11:06 ericzhou571

Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in more detail later.

already remove scripts related to training

ericzhou571 avatar Jun 18 '23 13:06 ericzhou571

sounds great, should I make another PR?

@ericzhou571 sure. Please send another PR.

merrymercy avatar Jun 19 '23 00:06 merrymercy

@ericzhou571 please send another PR for training. Thanks!

merrymercy avatar Jun 19 '23 23:06 merrymercy