text-generation-inference Qwen-7B support

Model description

Qwen-7B is a model that (allegedly) outperforms Llama-2-13B on important benchmarks like MMLU, HumanEval, GSM8K, etc. It has a chat model as well, and is available for commercial use for people with <100Mil monthly active users.

It is architecturally fairly similar to llama, but unfortunately has custom modelling code, and therefore lacks support in e.g. autogptq.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

https://github.com/QwenLM/Qwen-7B https://huggingface.co/Qwen/Qwen-7B-Chat

Aug 06 '23 21:08 152334H

fairly similar to llama

Seems exactly the same on first glance, just fork it and make it look like llama maybe ?

Aug 07 '23 10:08 Narsil

any plans to support this on text generation inference

Aug 21 '23 09:08 monksgoyal

Any updates on this?

Sep 29 '23 12:09 Tejaswgupta

Is there another TGI alternative that supports ChatGLM3, Qwen, Baichuan2?

Nov 16 '23 06:11 lihan

Deploying qwen to TGI encounters this error

2024-01-05T11:20:53.653089Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 370, in get_model
    return CausalLM(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/causal_lm.py", line 528, in __init__
    tokenizer.add_special_tokens({"pad_token": "[PAD]"})
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 971, in add_special_tokens
    added_tokens = self.add_tokens(added_tokens, special_tokens=True)
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1021, in add_tokens
    return self._add_tokens(new_tokens, special_tokens=special_tokens)
  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-1_8B-Chat/tokenization_qwen.py", line 165, in _add_tokens
    raise ValueError("Adding unknown special tokens is not supported")
ValueError: Adding unknown special tokens is not supported

2024-01-05T11:20:54.449399Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  4.56it/s]
Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 370, in get_model
    return CausalLM(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/causal_lm.py", line 528, in __init__
    tokenizer.add_special_tokens({"pad_token": "[PAD]"})

  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 971, in add_special_tokens
    added_tokens = self.add_tokens(added_tokens, special_tokens=True)

  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1021, in add_tokens
    return self._add_tokens(new_tokens, special_tokens=special_tokens)

  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-1_8B-Chat/tokenization_qwen.py", line 165, in _add_tokens
    raise ValueError("Adding unknown special tokens is not supported")

ValueError: Adding unknown special tokens is not supported
 rank=0
Error: ShardCannotStart
2024-01-05T11:20:54.548225Z ERROR text_generation_launcher: Shard 0 failed to start

Jan 05 '24 11:01 BrightXiaoHan

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 15 '24 02:04 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

Qwen-7B support

Model description

Open source status

Provide useful links for the implementation

text-generation-inference
text-generation-inference copied to clipboard