text-generation-inference
text-generation-inference copied to clipboard
Qwen-7B support
Model description
Qwen-7B is a model that (allegedly) outperforms Llama-2-13B on important benchmarks like MMLU, HumanEval, GSM8K, etc. It has a chat model as well, and is available for commercial use for people with <100Mil monthly active users.
It is architecturally fairly similar to llama, but unfortunately has custom modelling code, and therefore lacks support in e.g. autogptq.
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
https://github.com/QwenLM/Qwen-7B https://huggingface.co/Qwen/Qwen-7B-Chat
fairly similar to llama
Seems exactly the same on first glance, just fork it and make it look like llama maybe ?
any plans to support this on text generation inference
Any updates on this?
Is there another TGI alternative that supports ChatGLM3, Qwen, Baichuan2?
Deploying qwen to TGI encounters this error
2024-01-05T11:20:53.653089Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 370, in get_model
return CausalLM(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/causal_lm.py", line 528, in __init__
tokenizer.add_special_tokens({"pad_token": "[PAD]"})
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 971, in add_special_tokens
added_tokens = self.add_tokens(added_tokens, special_tokens=True)
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1021, in add_tokens
return self._add_tokens(new_tokens, special_tokens=special_tokens)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-1_8B-Chat/tokenization_qwen.py", line 165, in _add_tokens
raise ValueError("Adding unknown special tokens is not supported")
ValueError: Adding unknown special tokens is not supported
2024-01-05T11:20:54.449399Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 4.56it/s]
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 370, in get_model
return CausalLM(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/causal_lm.py", line 528, in __init__
tokenizer.add_special_tokens({"pad_token": "[PAD]"})
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 971, in add_special_tokens
added_tokens = self.add_tokens(added_tokens, special_tokens=True)
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1021, in add_tokens
return self._add_tokens(new_tokens, special_tokens=special_tokens)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-1_8B-Chat/tokenization_qwen.py", line 165, in _add_tokens
raise ValueError("Adding unknown special tokens is not supported")
ValueError: Adding unknown special tokens is not supported
rank=0
Error: ShardCannotStart
2024-01-05T11:20:54.548225Z ERROR text_generation_launcher: Shard 0 failed to start
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.