text-generation-inference CodeQwen1.5 not working

When trying to use CodeQwen1.5 7b awq (Tried fp16 directly from qwen and other awq versions as well but same error.), I am getting the following errors:

--max-input-length 4096 --max-total-tokens 9000 --model-id TechxGenus/CodeQwen1.5-7B-AWQ --max-batch-prefill-tokens 4096 --num-shard 4 --quantize awq --cuda-memory-fraction 0.6

2024-04-21T23:38:47.952625Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 240, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 201, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 532, in get_model
    return FlashQwen2(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_qwen2.py", line 45, in __init__
    tokenizer = Qwen2Tokenizer.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
    return cls._from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/qwen2/tokenization_qwen2.py", line 172, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Apr 21 '24 23:04 Ichigo3766

@Narsil Any insights to this that can help me run this model? Not sure what i am missing.

Apr 24 '24 01:04 Ichigo3766

Found the issue. Inside flash_qwen2.py, the tokenizer being used is not working. Switched the config and tokenizer to this:

tokenizer = AutoTokenizer.from_pretrained(
            model_id,
            revision=revision,
            padding_side="left",
            truncation_side="left",
            trust_remote_code=trust_remote_code,
        )
config = AutoConfig.from_pretrained(
            model_id, revision=revision, trust_remote_code=trust_remote_code
       )

Now able to use the model as expected.

Apr 24 '24 02:04 Ichigo3766

To further this, the tokenizer config path is not passed into the models. This make a custom config impossible

Apr 24 '24 16:04 sam-ulrich1

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 25 '24 01:05 github-actions[bot]

finish it by change config.json & tokenizer.json https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat/commit/91ffe86a74d00f76a75371d58a70ae5fe1bc0f29

May 25 '24 09:05 Grey4sh

To further this, the tokenizer config path is not passed into the models. This make a custom config impossible

https://github.com/huggingface/text-generation-inference/issues/1785#issuecomment-2131148602

May 25 '24 09:05 Grey4sh

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jun 26 '24 01:06 github-actions[bot]

Still unable to get this working.

2024-07-15T21:09:45.769744Z ERROR warmup{max_input_length=4096 max_prefill_tokens=5000 max_total_tokens=7000 max_batch_size=None}:warmup: text_generation_client: router/client/src/lib.rs:46: Server error: clamp() received an invalid combination of arguments - got (Seqlen, max=Tensor), but expected one of:

(Tensor input, Tensor min, Tensor max, *, Tensor out)
(Tensor input, Number min, Number max, *, Tensor out)

Error: WebServer(Warmup(Generation("clamp() received an invalid combination of arguments - got (Seqlen, max=Tensor), but expected one of:\n * (Tensor input, Tensor min, Tensor max, *, Tensor out)\n * (Tensor input, Number min, Number max, *, Tensor out)\n"))) 2024-07-15T21:09:45.852870Z ERROR text_generation_launcher: Webserver Crashed 2024-07-15T21:09:45.852896Z INFO text_generation_launcher: Shutting down shards

Jul 15 '24 21:07 Ichigo3766

text-generation-inference text-generation-inference copied to clipboard

CodeQwen1.5 not working

text-generation-inference
text-generation-inference copied to clipboard