text-generation-inference Regex response type is not respected

System Info

Using TGI through Inference Endpoints with this endpoint.

Reproduction

from huggingface_hub import InferenceClient

client = InferenceClient("https://o9blasawqn0vtw5b.us-east-1.aws.endpoints.huggingface.cloud")

regexp = "((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)"

resp = client.text_generation(
    f"What is Googles DNS? Please use the following regex: {regexp}",
    seed=42,
    grammar={
        "type": "regex",
        "value": regexp,
    },
)

print(resp)

I get output: 1.1.1.1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111

Expected behavior

I would expect to get a matching regex, so the last number should be between 0 and 255 and not a long sequence of 1s.

Jul 26 '24 11:07 aymeric-roucher

Using instead the models served on Inference API seem to work though:

from huggingface_hub import InferenceClient

client = InferenceClient("https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-8B-Instruct")

regexp = "((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)"

resp = client.text_generation(
    f"What is Googles DNS? Please use the following regex: {regexp}",
    seed=42,
    grammar={
        "type": "regex",
        "value": regexp,
    },
)

print(resp)

Jul 26 '24 11:07 aymeric-roucher

Thanks for reporting @aymeric-roucher 🙌

I wonder if it's technically respecting the grammar 🤔 my hypothesis:

every token respects the grammar since the token is a "1", so thats between 0 and 255
at the end, instead of ending the sequence, the model just continues adding the "1" token until max tokens is reached

I've seen similar behavior with e.g "\n", especially on 8B and smaller models.

What's weird is that you get a different result on the Inference endpoint and API.

I'll ping @drbh for this one as well 👍

Jul 29 '24 08:07 ErikKaum

Hey @aymeric-roucher thanks for pointing this out, I believe there was a couple issues with the regex expression I originally added to the docs. I think the \\d? notion may have caused subtle issues with the grammar compilation. A similar yet more simple and valid IP grammar would be (((25[0-5]|2[0-4]|[01])\.){3}(25[0-5]|2[0-4]|[01])) and that would match 1.1.1.1, the final 1 should only appear once. Apologies for any confusion!

I've just opened a PR to update the docs to use a different (easier to read and re use) regex expression here: https://github.com/huggingface/text-generation-inference/pull/2468

Aug 28 '24 17:08 drbh

closing as https://github.com/huggingface/text-generation-inference/pull/2468 was merged and is available here https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_guidance#constrain-with-pydantic

Aug 28 '24 17:08 drbh