outlines TypeError when calling glm-4-9b-chat (cannot use a string pattern on a bytes-like object)

Describe the issue as clearly as possible:

When I use outlines to call glm-4-9b-chat to operate the classification task, I met the error that "cannot use a string pattern on a bytes-like object".

Steps/code to reproduce the bug:

from outlines import models, generate
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "glm-4-9b-chat"

llm = AutoModelForCausalLM.from_pretrained(f"/datas/huggingface/{model_name}",trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(f"/datas/huggingface/{model_name}", trust_remote_code=True)

model = models.Transformers(llm,tokenizer)
generator = generate.choice(model,["positive","negative"])

Expected result:

positive OR negative

Error message:

Traceback (most recent call last):
  File "/datas/wangm/seeker_status/test_classification_glm.py", line 17, in <module>
    generator = generate.choice(model,["positive","negative"])
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/site-packages/outlines/generate/choice.py", line 17, in choice
    generator = regex(model, regex_str, sampler)
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/site-packages/outlines/generate/regex.py", line 33, in regex
    fsm = RegexGuide(regex_str, model.tokenizer)
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/site-packages/outlines/fsm/guide.py", line 145, in __init__
    ) = create_states_mapping(regex_string, tokenizer)
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/site-packages/outlines/caching.py", line 122, in wrapper
    result = cached_function(*args, **kwargs)
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/site-packages/outlines/fsm/guide.py", line 118, in create_states_mapping
    states_to_token_maps, empty_token_ids = create_fsm_index_tokenizer(
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/site-packages/outlines/fsm/regex.py", line 898, in create_fsm_index_tokenizer
    vocabulary, empty_token_ids = reduced_vocabulary(tokenizer)
  File "/datas/wangm/.conda/envs/llama/lib/python3.10/site-packages/outlines/fsm/regex.py", line 846, in reduced_vocabulary
    if "\ufffd" in token_str and not re_replacement_seq.match(token):
TypeError: cannot use a string pattern on a bytes-like object

Outlines/Python version information:

Version information

``` 0.0.46 Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] ```

Context for the issue:

Upon my testing, the error occurs in the generator = generate.choice(model,["positive", "negative"]) line of code.

Jul 14 '24 08:07 sci-m-wang

Hi, This is observed also in the regex generator for this model (outlines.generate.regex(model, decoding_regex, sampler=sampler)). @rlouf it would be great if you could please look into this. Always thankful for your great contribution.

Jul 23 '24 18:07 KawshikManikantan

Same problems found in Qwen family models.

Jul 30 '24 08:07 tens444

Same problems in vLLM "response_format": {"type": "json_object"}

Jul 31 '24 01:07 Dong148

Anyone have a workaround?

Aug 07 '24 09:08 abaveja313

I had the same problem with a glm4-9b model !

Aug 08 '24 07:08 XxxAtlantis

@XxxAtlantis any luck in figuring out a workaround?

Aug 16 '24 02:08 abaveja313

I got the same error and delved into the code and the GLM tokenizer a bit. I believe the direct cause of the error is that the GLM tokenizer does not strictly adhere to the definition of get_vocab() from the Transformers library. According to the documentation: https://huggingface.co/docs/transformers/v4.44.0/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.get_vocab get_vocab() is supposed to return the vocabulary in string format, and Outlines uses string utilities like startswith to process the language model's vocabulary and construct its FSM. However, the get_vocab() function of the GLM tokenizer returns the vocabulary in byte format, which is incompatible with string utilities.

Simply converting the byte-based vocabulary to strings will not fix the issue. I tried this approach and discovered that parts of the GLM vocabulary cannot be converted to UTF-8 strings. It turns out that Qwen and GLM use the Byte Pair Encoding (BPE) tokenization technique, and some tokens are not valid UTF-8 encodings. More information can be found here: https://github.com/QwenLM/Qwen/blob/main/tokenization_note.md

In conclusion:

The error can only be resolved by having Outlines support BPE byte tokens. Chinese language models like InternLM, which do not use BPE, are more compatible with Outlines.

Aug 16 '24 05:08 duming

I have a simple fix for the issue.

It seems glm-4-9b-chats tokenizer uses BPE slightly differently from the LLaMA-style BPE. Specifically, glm-4-9b's tokenizer explicitly uses bytes rather than padding and converting to a string.

from outlines import models, generate
from transformers import AutoModelForCausalLM, AutoTokenizer

 
llm = AutoModelForCausalLM.from_pretrained("THUDM/glm-4-9b-chat",trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat", trust_remote_code=True)
 
model = models.Transformers(llm,tokenizer)

generator = generate.choice(model, ['和长', '本代模'])
input = '我们在一些经典任务上对 GLM-4-9B-Chat 模型进行了评测,并得到了如下的结果'

print(generator(input))
# '和长'

Could you please help me test this PR and ensure it resolves the issues you all have been seeing?

Preview:

pip uninstall -y outlines
pip install --upgrade git+https://github.com/lapp0/outlines@fix-bpe

Sep 14 '24 20:09 lapp0

I have a simple fix for the issue.

It seems glm-4-9b-chats tokenizer uses BPE slightly differently from the LLaMA-style BPE. Specifically, glm-4-9b's tokenizer explicitly uses bytes rather than padding and converting to a string.
from outlines import models, generate
from transformers import AutoModelForCausalLM, AutoTokenizer

 
llm = AutoModelForCausalLM.from_pretrained("THUDM/glm-4-9b-chat",trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat", trust_remote_code=True)
 
model = models.Transformers(llm,tokenizer)

generator = generate.choice(model, ['和长', '本代模'])
input = '我们在一些经典任务上对 GLM-4-9B-Chat 模型进行了评测,并得到了如下的结果'

print(generator(input))
# '和长'
Could you please help me test this PR and ensure it resolves the issues you all have been seeing?

Preview:
pip uninstall -y outlines
pip install --upgrade git+https://github.com/lapp0/outlines@fix-bpe

I tried this version of the package. But I met the error that

  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: libharpcuda.so.0: cannot open shared object file: No such file or directory

Oct 03 '24 15:10 sci-m-wang

@sci-m-wang can you provide the full traceback? I see a similar error relating to triton in the Unsloth repo https://github.com/unslothai/unsloth/issues/872

Oct 04 '24 16:10 lapp0

@sci-m-wang you can disable Dynamo error suppression by

import torch._dynamo
torch._dynamo.config.suppress_errors = True

Oct 07 '24 14:10 Control-derek

@sci-m-wang you can disable Dynamo error suppression by
import torch._dynamo
torch._dynamo.config.suppress_errors = True

If this doesn't work, try disabling the dynamic graph as well.

import torch
torch._dynamo.config.disable = True

Mar 07 '25 13:03 Control-derek