FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

[Chatbot Arena] Add Falcon 40B model

Open EwoutH opened this issue 1 year ago • 16 comments

Abu Dhabi's Technology Innovation Institute (TII) just released new 7B and 40B LLMs. The Falcon-40B model is now at the top of the Open LLM Leaderboard, beating llama-30b-supercot and llama-65b among others.

Therefore, I would love to see Falcon 40B model added to the Chatbot Arena and it's Leaderboard!

Model Revision Average ARC (25-shot) HellaSwag (10-shot) MMLU (5-shot) TruthfulQA (0-shot)
tiiuae/falcon-40b main 60.4 61.9 85.3 52.7 41.7
ausboss/llama-30b-supercot main 59.8 58.5 82.9 44.3 53.6
llama-65b main 58.3 57.8 84.2 48.8 42.3
MetaIX/GPT4-X-Alpasta-30b main 57.9 56.7 81.4 43.6 49.7

Press release: UAE's Technology Innovation Institute Launches Open-Source "Falcon 40B" Large Language Model for Research & Commercial Utilization

The Technology Innovation Institute (TII) in Abu Dhabi has announced its open-source large language model (LLM), the Falcon 40B. With 40 billion parameters, Falcon 40B is the UAE's first large-scale AI model, indicating the country's ambition in the field of AI and its commitment to promote innovation and research.

Unlike most LLMs, which typically only provide non-commercial users access, Falcon 40B is open to both research and commercial usage. The TII has also included the model's weights in the open-source package, which will enhance the model's capabilities and allow for more effective fine-tuning.

In addition to the launch of Falcon 40B, the TII has initiated a call for proposals from researchers and visionaries interested in leveraging the model to create innovative use cases or explore further applications. As a reward for exceptional research proposals, selected projects will receive "training compute power" as an investment, allowing for more robust data analysis and complex modeling. VentureOne, the commercialization arm of ATRC, will provide computational resources for the most promising projects.

TII's Falcon 40B has shown impressive performance since its unveiling in March 2023. When benchmarked using Stanford University’s HELM LLM tool, it used less training compute power compared to other renowned LLMs such as OpenAI's GPT-3, DeepMind's Chinchilla AI, and Google's PaLM-62B.

Those interested in accessing Falcon 40B or proposing use cases can do so through the FalconLLM.TII.ae website. Falcon LLMs open-sourced to date are available under a license built upon the principles of the open-source Apache 2.0 software, permitting a broad range of free use.

Hugging Face links

EwoutH avatar May 26 '23 14:05 EwoutH

So I did spend some time adding and running Falcon-7B instruct locally, but the streamed output seems like the model is hallucinating. While on the other hand, when I use model.generate for the same input, the output is correct. As I am new to the codebase, I'm not completely familiar with why both outputs turn out different.

model.generate output: image

streamed output image

This is how I initialize the conversation object:

register_conv_template(
    Conversation(
        name="falcon",
        system="""The conversation between human and AI assistant.""",
        roles=("[|Human|]", "[|AI|]"),
        messages=(),
        offset=0,
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="\n",
        stop_str=["\n"],
        stop_token_ids=[193],
    )
)

I am not sure why this is happening. Has this got anything to do with how the outputs are handled and fed back to the model for streaming? Any suggestions?

OAfzal avatar May 29 '23 19:05 OAfzal

@OAfzal could the random output be because fastchat is loading the model with dtype float16 rather than bfloat16?

timesler avatar May 30 '23 05:05 timesler

@timesler So loading the model in float16 and running a forward pass results in the following error: image

I did not spend too much time debugging this as I was able to load and run the model in with bfloat16. To confirm, I am loading the model in bfloat16 for the above results. I also tried loading it with float32, but that did not help either. I have tried the exact same configuration in a notebook with .generate and it gives the correct output

OAfzal avatar May 30 '23 05:05 OAfzal

I tried using model.generate inside the chatloop function and the results are correct. So that confirms that the issue is with generate_stream_func. I will try and inspect it further.

OAfzal avatar May 30 '23 06:05 OAfzal

wired output then.

Trangle avatar May 31 '23 12:05 Trangle

register_conv_template(
    Conversation(
        name="falcon",
        system="""The following is a conversation between a human and an AI assistant named Falcon. The human and the AI assistant take turns chatting. Human statements start with [|Human|] and AI assistant statements start with [|AI|]. The AI assistant always provides responses in as much detail as possible, and in Markdown format. The AI assistant always declines to engage with topics, questions and instructions related to unethical, controversial, or sensitive issues. Complete the transcript in exactly that format.\n""",
        roles=("[|Human|]", "[|AI|]"),
        messages=(
            ("[|Human|]", "Hello!"),
            ("[|AI|]", "Hi!"),
        ),
        offset=2,
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="\n",
        stop_str="[|Human|]",
        stop_token_ids=[193],
    )
)

Trangle avatar May 31 '23 12:05 Trangle

Hey @Trangle

Would you like to maybe provide more context with the code you provide? I did use your template too, but the results look no better.

OAfzal avatar May 31 '23 12:05 OAfzal

@OAfzal I just saw that official support has been added for Falcon in https://github.com/huggingface/text-generation-inference, so you may be able to glean some insight there about how to get streaming working

timesler avatar May 31 '23 15:05 timesler

@timesler Ohh that sounds great! I will look that up.

OAfzal avatar May 31 '23 16:05 OAfzal

Double thumbs up on this one now that Falcon is fully open source (Apache 2.0). We should aim to focus all our efforts in that direction going forward where possible.

digisomni avatar May 31 '23 17:05 digisomni

Is anybody working on this? I'd love to try adding Falcon into this library.

david1542 avatar Jun 03 '23 19:06 david1542

Hey @Trangle

Would you like to maybe provide more context with the code you provide? I did use your template too, but the results look no better.

How about this one?

   Conversation(
        name="falcon",
        system='',
        roles=("User", "Assistant"),
        messages=[],
        offset=0,
        sep_style=SeparatorStyle.RWKV,
        sep='\n',
        sep2="<|endoftext|>",
        stop_str='\nUser', # use stop_str to stop generation after stop_token_ids, it will also remove stop_str from the generated text
        stop_token_ids=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], # it better only put special tokens here, because tokenizer only remove special tokens
        # stop_token_ids=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 5584, 7932, 32250],
    )

ericzhou571 avatar Jun 06 '23 06:06 ericzhou571

I tried using model.generate inside the chatloop function and the results are correct. So that confirms that the issue is with generate_stream_func. I will try and inspect it further.

I found the same thing as you did, have you try to fix it? Or you have already made a pull request, but wait to be reviewed?

Best

ericzhou571 avatar Jun 06 '23 06:06 ericzhou571

Hi, could anyone here who successfully run falcon submit a pull request?

merrymercy avatar Jun 10 '23 14:06 merrymercy

model(input_ids=torch.as_tensor([[token]], device=device),use_cache=True, past_key_values=past_key_values,) I found falcon seems not support past_key_values

Tron2016 avatar Jun 12 '23 05:06 Tron2016

@ericzhou571 I seem fixed this issue: https://huggingface.co/tiiuae/falcon-40b/discussions/48#64807969bb25a636c9da2cd7

Tron2016 avatar Jun 13 '23 10:06 Tron2016

@Tron2016 Hi, is this your fixed version? You can find it at: https://huggingface.co/tiiuae/falcon-40b/discussions/48#6488434b7fe834f5890b69f8 I'm not sure where I should apply this code. Should it be added to the RWmodel file provided by the falcon weight package? I added support for falcon to fastchat in this PR: https://github.com/lm-sys/FastChat/pull/1696/files. However, I created a new file specifically for falcon inference. If your changes can be made in the fastchat code, maybe we can still use the fastchat default generate stream?

ericzhou571 avatar Jun 18 '23 12:06 ericzhou571

@Tron2016 Hi, is this your fixed version? You can find it at: https://huggingface.co/tiiuae/falcon-40b/discussions/48#6488434b7fe834f5890b69f8 I'm not sure where I should apply this code. Should it be added to the RWmodel file provided by the falcon weight package? I added support for falcon to fastchat in this PR: https://github.com/lm-sys/FastChat/pull/1696/files. However, I created a new file specifically for falcon inference. If your changes can be made in the fastchat code, maybe we can still use the fastchat default generate stream?

Yes, the rotatry embeding also has bug and need to be fixed: https://huggingface.co/tiiuae/falcon-7b/discussions/17#64890b51ce7b9a2abe36b762, it should be added to the RWmodel file.

Tron2016 avatar Jun 19 '23 07:06 Tron2016

Hi guys, I tried using the new Falcon changes from main but seems like the falcon_generate_stream doesn't stop generating text when it should. I opened an issue on that: #1793

You can see there that when I started the conversation with "Hello there" and it starts good. However, it seems to keep going generating tokens.

dudulasry avatar Jun 27 '23 11:06 dudulasry

Hi guys, I tried using the new Falcon changes from main but seems like the falcon_generate_stream doesn't stop generating text when it should. I opened an issue on that: #1793

Hi, can you provide more detail about that problem?

  1. which kind of "doesn't stop generating" you face? repeating? or generate a whole conversations?
  2. what is your model name? do you have "falcon" inside?
  3. which model you use? falcon-xb or falcon-xb-instruct?
  4. can you check which conversation template you use? do you use falcon template correctly?

Best

ericzhou571 avatar Jun 27 '23 11:06 ericzhou571

@ericzhou571 Hi :) Thanks for the quick reply:

  1. It seems like the model continues to generate new tokens, even though it should have stopped. I attached an example in the issue (#1793) where the model started with "Hi there! How can I help you today?" but then it continued talking to itself, i.e "Hi there! How can I help you today?I'm looking ...".
  2. I'm using tiiuae/falcon-7b-instruct
  3. Same as point 3. I'm using tiiuae/falcon-7b-instruct. The name should be derived from the path, as I understand 🧐
  4. I tried to debug it and looks like it uses the correct template (falcon template from the main branch):

image

dudulasry avatar Jun 27 '23 12:06 dudulasry

@dudulasry I made a mistake, I push a conversation template without system message to the fastchat main repo. If you add a system message, everything goes well

ericzhou571 avatar Jun 28 '23 02:06 ericzhou571

What the status of this? What needs to be done?

Randl avatar Jul 21 '23 02:07 Randl

Falcon has been supported.

merrymercy avatar Jul 21 '23 09:07 merrymercy