Regex response type is not respected
System Info
Using TGI through Inference Endpoints with this endpoint.
Reproduction
This is the example from the doc.
from huggingface_hub import InferenceClient
client = InferenceClient("https://o9blasawqn0vtw5b.us-east-1.aws.endpoints.huggingface.cloud")
regexp = "((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)"
resp = client.text_generation(
f"What is Googles DNS? Please use the following regex: {regexp}",
seed=42,
grammar={
"type": "regex",
"value": regexp,
},
)
print(resp)
I get output:
1.1.1.1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
Expected behavior
I would expect to get a matching regex, so the last number should be between 0 and 255 and not a long sequence of 1s.
Using instead the models served on Inference API seem to work though:
from huggingface_hub import InferenceClient
client = InferenceClient("https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-8B-Instruct")
regexp = "((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)"
resp = client.text_generation(
f"What is Googles DNS? Please use the following regex: {regexp}",
seed=42,
grammar={
"type": "regex",
"value": regexp,
},
)
print(resp)
Thanks for reporting @aymeric-roucher 🙌
I wonder if it's technically respecting the grammar 🤔 my hypothesis:
- every token respects the grammar since the token is a "1", so thats between 0 and 255
- at the end, instead of ending the sequence, the model just continues adding the "1" token until max tokens is reached
I've seen similar behavior with e.g "\n", especially on 8B and smaller models.
What's weird is that you get a different result on the Inference endpoint and API.
I'll ping @drbh for this one as well 👍
Hey @aymeric-roucher thanks for pointing this out, I believe there was a couple issues with the regex expression I originally added to the docs. I think the \\d? notion may have caused subtle issues with the grammar compilation. A similar yet more simple and valid IP grammar would be (((25[0-5]|2[0-4]|[01])\.){3}(25[0-5]|2[0-4]|[01])) and that would match 1.1.1.1, the final 1 should only appear once. Apologies for any confusion!
I've just opened a PR to update the docs to use a different (easier to read and re use) regex expression here: https://github.com/huggingface/text-generation-inference/pull/2468
closing as https://github.com/huggingface/text-generation-inference/pull/2468 was merged and is available here https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_guidance#constrain-with-pydantic