outlines RegexPrefixAllowedTokens does not work for batch

Describe the issue as clearly as possible:

In these lines of code, https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/outlines/integrations/transformers.py#L87-L100

it seems that the batch id is not used at all, and there is an error triggered when batch is used and input_ids is empty. https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/outlines/integrations/transformers.py#L116

Do you know how this can be fixed? Thanks! @saattrupdan @rlouf

Steps/code to reproduce the bug:

# Based on https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/examples/transformers_integration.py#L6
from pydantic import BaseModel
from transformers import  AutoModelForCausalLM, AutoProcessor

from outlines.integrations.transformers import JSONPrefixAllowedTokens


class Person(BaseModel):
    first_name: str
    surname: str


processor = AutoProcessor.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
prefix_allowed_tokens_fn = JSONPrefixAllowedTokens(
    schema=Person, tokenizer_or_pipe=processor.tokenizer,
)

inputs = processor(...)
model.generate(
    **inputs,
    prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
)

Expected result:

When inputs is a batch, this should not trigger an empty list indexing error

Error message:

No response

Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

No response

Apr 05 '24 21:04 randomcodelookup

You are correct, this needs to be fixed.

Apr 08 '24 14:04 rlouf

Thanks @rlouf . I'm looking to fix this too but just to get more context, it seems like this function generally does not allow parallelized operations across the batch is that true? I also wonder why not implement an equivalent of JSONLogitsProcessor, is this method faster?

Apr 12 '24 01:04 randomcodelookup

Thank you for considering contributing. I am not sure why this choice was made at the time, you'd have to ask @saattrupdan

Apr 12 '24 07:04 rlouf

@rlouf what are your thoughts on replacing outlines/integrations/transformers.py with a set of logits processors classes (similar to outlines/integrations/llamacpp.py) and updating examples/transformers_integration.py accordingly?

This will ensure consistency between different inference engines integrations and move us closer to having one method of applying a Guide to an inference engine / model.

May 23 '24 03:05 lapp0

@lapp0 I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.

Here’s how I currently use it: https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474

Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?

May 23 '24 07:05 saattrupdan

@rlouf what are your thoughts on replacing outlines/integrations/transformers.py with a set of logits processors classes (similar to outlines/integrations/llamacpp.py) and updating examples/transformers_integration.py accordingly?

Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.

May 23 '24 11:05 rlouf

@lapp0 I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.

Here’s how I currently use it: https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474

Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?

It works for me except for the case that no input IDs are passed as described in the issue. If you pass just a bos token it works.

I think the main issue is this regression was introduced through a distinct interface. vllm and llamacpp use a LogitsProcessor, transformers should as well.

Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.

It does exist. We should close this issue in favor of #806

May 23 '24 19:05 lapp0

@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".

Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.

Further, and most importantly, will such a change still mean that I can take a transformers model and call generate with the proposed logits processors? I'm not interested in using the outlines wrapper classes.

May 24 '24 06:05 saattrupdan

@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".

Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.

Logits processors can accomplish the same goal, but they require code changes to use. I see only a handful of public repos using these classes. https://github.com/search?q=%22JSONPrefixAllowedTokens%22&ref=opensearch&type=code

Further, and most importantly, will such a change still mean that I can take a transformers model and call generate with the proposed logits processors? I'm not interested in using the outlines wrapper classes.

Yes, I will design it such that you can use it directly with

model.generate(
    **inputs,
    logits_processor=LogitsProcessorList([RegexLogitsProcessor(*args))])
)

May 24 '24 23:05 lapp0

@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?

May 25 '24 07:05 saattrupdan

@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?

Here's a dummy example that shows a logits processor which restricts tokens to those with IDs > 2000. Please make sure you're on the latest version of transformers.

>>> import transformers
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-125m")
>>> model = transformers.AutoModelForCausalLM.from_pretrained("facebook/opt-125m")
>>> model.generate(**tokenizer("hello", return_tensors="pt"))
tensor([[    2, 42891,     6,    38,   437,    10,    92,   869,     8,    38,
           437,   546,    13,    10,   205,   165,     7,   310,    19,     4]])
>>> class NoSmallTokenIDsProcessor(transformers.LogitsProcessor):
...     def __call__(self, input_ids, scores):
...         scores[:, :2000] = float("-inf")
...         return scores
>>> model.generate(**tokenizer("hello", return_tensors="pt"), logits_processor=transformers.LogitsProcessorList([NoSmallTokenIDsProcessor()]))
tensor([[    2, 42891,  2598, 39275,  7852, 50118, 13368,  2598, 39275,  7852,
         50118, 12229,  2598, 39275,  7852, 50118, 12229,  2598, 39275,  7852]])

May 25 '24 08:05 lapp0

@lapp0 Aha, I didn't realise they included that now - thanks for explaining! 😊

May 25 '24 08:05 saattrupdan

@saattrupdan could you please tell me if pip install git+https://github.com/lapp0/outlines@transformers-use-logits-processor meets your requirements?

Rendered docs: https://github.com/lapp0/outlines/blob/transformers-use-logits-processor/docs/reference/models/transformers.md#example-direct-transformers-library-use

Jun 13 '24 00:06 lapp0

@lapp0 It's not working as expected. When it's in a loop, only the first generation will succeed.

##Code: import outlines import transformers

model_uri = "microsoft/Phi-3-mini-4k-instruct"

outlines_tokenizer = outlines.models.TransformerTokenizer( transformers.AutoTokenizer.from_pretrained(model_uri) ) phone_number_logits_processor = outlines.processors.RegexLogitsProcessor( "\+?[1-9][0-9]{7,14}", # phone number pattern outlines_tokenizer, )

generator = transformers.pipeline('text-generation', model=model_uri)

for _ in range(3): output = generator( "Jenny gave me her number it's ", logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor]) ) print(output)

##Output: [{'generated_text': "Jenny gave me her number it's 5551234567"}] [{'generated_text': "Jenny gave me her number it's "}] [{'generated_text': "Jenny gave me her number it's "}]

Jul 11 '24 01:07 sigjhl

@sigjhl logits processors are stateful and for one time use. I'll be adding this to the documentation, thanks for pointing it out.

For now you could try

generator = transformers.pipeline('text-generation', model=model_uri)

for _ in range(3):
    phone_number_logits_processor = outlines.processors.RegexLogitsProcessor(
        "\+?[1-9][0-9]{7,14}", # phone number pattern
        outlines_tokenizer,
    )
    output = generator(
        "Jenny gave me her number it's ",
        logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])
    )
    print(output)

Please let me know if you have any other questions

Jul 13 '24 21:07 lapp0

outlines outlines copied to clipboard

RegexPrefixAllowedTokens does not work for batch

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

outlines
outlines copied to clipboard