outlines
outlines copied to clipboard
RegexPrefixAllowedTokens does not work for batch
Describe the issue as clearly as possible:
In these lines of code, https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/outlines/integrations/transformers.py#L87-L100
it seems that the batch id is not used at all, and there is an error triggered when batch is used and input_ids is empty. https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/outlines/integrations/transformers.py#L116
Do you know how this can be fixed? Thanks! @saattrupdan @rlouf
Steps/code to reproduce the bug:
# Based on https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/examples/transformers_integration.py#L6
from pydantic import BaseModel
from transformers import AutoModelForCausalLM, AutoProcessor
from outlines.integrations.transformers import JSONPrefixAllowedTokens
class Person(BaseModel):
first_name: str
surname: str
processor = AutoProcessor.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
prefix_allowed_tokens_fn = JSONPrefixAllowedTokens(
schema=Person, tokenizer_or_pipe=processor.tokenizer,
)
inputs = processor(...)
model.generate(
**inputs,
prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
)
Expected result:
When inputs is a batch, this should not trigger an empty list indexing error
Error message:
No response
Outlines/Python version information:
Version information
Context for the issue:
No response
You are correct, this needs to be fixed.
Thanks @rlouf . I'm looking to fix this too but just to get more context, it seems like this function generally does not allow parallelized operations across the batch is that true? I also wonder why not implement an equivalent of JSONLogitsProcessor, is this method faster?
Thank you for considering contributing. I am not sure why this choice was made at the time, you'd have to ask @saattrupdan
@rlouf what are your thoughts on replacing outlines/integrations/transformers.py with a set of logits processors classes (similar to outlines/integrations/llamacpp.py) and updating examples/transformers_integration.py accordingly?
This will ensure consistency between different inference engines integrations and move us closer to having one method of applying a Guide to an inference engine / model.
@lapp0 I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.
Here’s how I currently use it: https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474
Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?
@rlouf what are your thoughts on replacing
outlines/integrations/transformers.pywith a set of logits processors classes (similar tooutlines/integrations/llamacpp.py) and updatingexamples/transformers_integration.pyaccordingly?
Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.
@lapp0 I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.
Here’s how I currently use it: https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474
Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?
It works for me except for the case that no input IDs are passed as described in the issue. If you pass just a bos token it works.
I think the main issue is this regression was introduced through a distinct interface. vllm and llamacpp use a LogitsProcessor, transformers should as well.
Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.
It does exist. We should close this issue in favor of #806
@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".
Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.
Further, and most importantly, will such a change still mean that I can take a transformers model and call generate with the proposed logits processors? I'm not interested in using the outlines wrapper classes.
@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".
Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.
Logits processors can accomplish the same goal, but they require code changes to use. I see only a handful of public repos using these classes. https://github.com/search?q=%22JSONPrefixAllowedTokens%22&ref=opensearch&type=code
Further, and most importantly, will such a change still mean that I can take a
transformersmodel and callgeneratewith the proposed logits processors? I'm not interested in using the outlines wrapper classes.
Yes, I will design it such that you can use it directly with
model.generate(
**inputs,
logits_processor=LogitsProcessorList([RegexLogitsProcessor(*args))])
)
@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?
@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?
Here's a dummy example that shows a logits processor which restricts tokens to those with IDs > 2000. Please make sure you're on the latest version of transformers.
>>> import transformers
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-125m")
>>> model = transformers.AutoModelForCausalLM.from_pretrained("facebook/opt-125m")
>>> model.generate(**tokenizer("hello", return_tensors="pt"))
tensor([[ 2, 42891, 6, 38, 437, 10, 92, 869, 8, 38,
437, 546, 13, 10, 205, 165, 7, 310, 19, 4]])
>>> class NoSmallTokenIDsProcessor(transformers.LogitsProcessor):
... def __call__(self, input_ids, scores):
... scores[:, :2000] = float("-inf")
... return scores
>>> model.generate(**tokenizer("hello", return_tensors="pt"), logits_processor=transformers.LogitsProcessorList([NoSmallTokenIDsProcessor()]))
tensor([[ 2, 42891, 2598, 39275, 7852, 50118, 13368, 2598, 39275, 7852,
50118, 12229, 2598, 39275, 7852, 50118, 12229, 2598, 39275, 7852]])
@lapp0 Aha, I didn't realise they included that now - thanks for explaining! 😊
@saattrupdan could you please tell me if pip install git+https://github.com/lapp0/outlines@transformers-use-logits-processor meets your requirements?
Rendered docs: https://github.com/lapp0/outlines/blob/transformers-use-logits-processor/docs/reference/models/transformers.md#example-direct-transformers-library-use
@lapp0 It's not working as expected. When it's in a loop, only the first generation will succeed.
##Code: import outlines import transformers
model_uri = "microsoft/Phi-3-mini-4k-instruct"
outlines_tokenizer = outlines.models.TransformerTokenizer( transformers.AutoTokenizer.from_pretrained(model_uri) ) phone_number_logits_processor = outlines.processors.RegexLogitsProcessor( "\+?[1-9][0-9]{7,14}", # phone number pattern outlines_tokenizer, )
generator = transformers.pipeline('text-generation', model=model_uri)
for _ in range(3): output = generator( "Jenny gave me her number it's ", logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor]) ) print(output)
##Output: [{'generated_text': "Jenny gave me her number it's 5551234567"}] [{'generated_text': "Jenny gave me her number it's "}] [{'generated_text': "Jenny gave me her number it's "}]
@sigjhl logits processors are stateful and for one time use. I'll be adding this to the documentation, thanks for pointing it out.
For now you could try
generator = transformers.pipeline('text-generation', model=model_uri)
for _ in range(3):
phone_number_logits_processor = outlines.processors.RegexLogitsProcessor(
"\+?[1-9][0-9]{7,14}", # phone number pattern
outlines_tokenizer,
)
output = generator(
"Jenny gave me her number it's ",
logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])
)
print(output)
Please let me know if you have any other questions