guardrails icon indicating copy to clipboard operation
guardrails copied to clipboard

[bug] Custom LLM_Callable for ProvenanceV1 causes ValueError: invalid syntax during validator parse_token

Open e-m-albright opened this issue 1 year ago • 3 comments

Hello! I'm trying out the ProvenanceV1 validator using a custom llm_callable which conforms to the signature of input string -> output string. It's calling Google's VertexAI text generation endpoint under the hood which works for me. The following

vertexai_llm = ...

def predict_function(prompt: str) -> str:
    response = vertexai_llm.text_generative_model.predict(prompt)
    return response.text


guard_1 = Guard.from_string(
    validators=[
        ProvenanceV1(
            validation_method="sentence",
            llm_callable=predict_function, 
            top_k=3,
            max_tokens=2,
            on_fail="fix",
        )
    ],
    description="testmeout",
)

When running I get an error I have a lot of trouble breaking apart into what's happening and what do I need to do to remedy the issue. It seems like somewhere near reasking the library is running an eval on the function?

---------------------------------------------------------------------------
SyntaxError                               Traceback (most recent call last)
File /usr/local/lib/python3.10/site-packages/guardrails/schema.py:144, in FormatAttr.parse_token(cls, token)
    142 try:
    143     # Evaluate the Python expression.
--> 144     t = eval(t)
    145 except (ValueError, SyntaxError, NameError) as e:

SyntaxError: invalid syntax (<string>, line 1)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[41], line 6
      2     response = vertexai_llm.text_generative_model.predict(prompt)
      3     return response.text
----> 6 guard_1 = Guard.from_string(
      7     validators=[
      8         ProvenanceV1(
      9             validation_method="sentence",  # can be "sentence" or "full"
     10             llm_callable=predict_function,  # as explained above
     11             # llm_callable="gpt-3.5-turbo",  # as explained above
     12             top_k=3,  # number of chunks to retrieve
     13             max_tokens=500,
     14             on_fail="fix",
     15         )
     16     ],
     17     description="testmeout",
     18 )

File /usr/local/lib/python3.10/site-packages/guardrails/guard.py:203, in Guard.from_string(cls, validators, description, prompt, instructions, reask_prompt, reask_instructions, num_reasks)
    180 @classmethod
    181 def from_string(
    182     cls,
   (...)
    189     num_reasks: int = None,
    190 ) -> "Guard":
    191     """Create a Guard instance for a string response with prompt,
    192     instructions, and validations.
    193 
   (...)
    201         num_reasks (int, optional): The max times to re-ask the LLM for invalid output.
    202     """  # noqa
--> 203     rail = Rail.from_string_validators(
    204         validators=validators,
    205         description=description,
    206         prompt=prompt,
    207         instructions=instructions,
    208         reask_prompt=reask_prompt,
    209         reask_instructions=reask_instructions,
    210     )
    211     return cls(rail, num_reasks=num_reasks)

File /usr/local/lib/python3.10/site-packages/guardrails/rail.py:145, in Rail.from_string_validators(cls, validators, description, prompt, instructions, reask_prompt, reask_instructions)
    127 @classmethod
    128 def from_string_validators(
    129     cls,
   (...)
    135     reask_instructions: Optional[str] = None,
    136 ):
    137     xml = generate_xml_code(
    138         prompt=prompt,
    139         instructions=instructions,
   (...)
    143         description=description,
    144     )
--> 145     return cls.from_xml(xml)

File /usr/local/lib/python3.10/site-packages/guardrails/rail.py:99, in Rail.from_xml(cls, xml)
     97 if reask_instructions is not None:
     98     reask_instructions = reask_instructions.text
---> 99 output_schema = cls.load_output_schema(
    100     raw_output_schema,
    101     reask_prompt=reask_prompt,
    102     reask_instructions=reask_instructions,
    103 )
    105 # Parse instructions for the LLM. These are optional but if given,
    106 # LLMs can use them to improve their output. Commonly these are
    107 # prepended to the prompt.
    108 instructions = xml.find("instructions")

File /usr/local/lib/python3.10/site-packages/guardrails/rail.py:177, in Rail.load_output_schema(root, reask_prompt, reask_instructions)
    175 # If root contains a `type="string"` attribute, then it's a StringSchema
    176 if "type" in root.attrib and root.attrib["type"] == "string":
--> 177     return StringSchema(
    178         root,
    179         reask_prompt_template=reask_prompt,
    180         reask_instructions_template=reask_instructions,
    181     )
    182 return JsonSchema(
    183     root,
    184     reask_prompt_template=reask_prompt,
    185     reask_instructions_template=reask_instructions,
    186 )

File /usr/local/lib/python3.10/site-packages/guardrails/schema.py:785, in StringSchema.__init__(self, root, reask_prompt_template, reask_instructions_template)
    778 def __init__(
    779     self,
    780     root: ET._Element,
    781     reask_prompt_template: Optional[str] = None,
    782     reask_instructions_template: Optional[str] = None,
    783 ) -> None:
    784     self.string_key = "string"
--> 785     super().__init__(root)
    787     # Setup reask templates
    788     self._reask_prompt_template = reask_prompt_template

File /usr/local/lib/python3.10/site-packages/guardrails/schema.py:293, in Schema.__init__(self, root, schema, reask_prompt_template, reask_instructions_template)
    291 self.root = root
    292 if root is not None:
--> 293     self.setup_schema(root)
    295 # Setup reask templates
    296 self.check_valid_reask_prompt(reask_prompt_template)

File /usr/local/lib/python3.10/site-packages/guardrails/schema.py:806, in StringSchema.setup_schema(self, root)
    804 # make root tag into a string tag
    805 root_string = ET.Element("string", root.attrib)
--> 806 self[self.string_key] = String.from_xml(root_string)

File /usr/local/lib/python3.10/site-packages/guardrails/datatypes.py:145, in DataType.from_xml(cls, element, strict)
    141 # TODO: don't want to pass strict through to DataType,
    142 # but need to pass it to FormatAttr.from_element
    143 # how to handle this?
    144 format_attr = FormatAttr.from_element(element)
--> 145 format_attr.get_validators(strict)
    147 data_type = cls({}, format_attr, element)
    148 data_type.set_children(element)

File /usr/local/lib/python3.10/site-packages/guardrails/schema.py:215, in FormatAttr.get_validators(self, strict)
    213 _validators = []
    214 _unregistered_validators = []
--> 215 parsed = self.parse().items()
    216 for validator_name, args in parsed:
    217     # Check if the validator is registered for this element.
    218     # The validators in `format` that are not registered for this element
    219     # will be ignored (with an error or warning, depending on the value of
    220     # `strict`), and the registered validators will be returned.
    221     if validator_name not in types_to_validators[self.element.tag]:

File /usr/local/lib/python3.10/site-packages/guardrails/schema.py:169, in FormatAttr.parse(self)
    166 validators = {}
    167 for token in self.tokens:
    168     # Parse the token into a validator name and a list of parameters.
--> 169     validator_name, args = self.parse_token(token)
    170     validators[validator_name] = args
    172 return validators

File /usr/local/lib/python3.10/site-packages/guardrails/schema.py:146, in FormatAttr.parse_token(cls, token)
    144             t = eval(t)
    145         except (ValueError, SyntaxError, NameError) as e:
--> 146             raise ValueError(
    147                 f"Python expression `{t}` is not valid, "
    148                 f"and raised an error: {e}."
    149             )
    150     args.append(t)
    152 return validator.strip(), args

ValueError: Python expression `<function predict_function at 0xffff4c7255a0>` is not valid, and raised an error: invalid syntax (<string>, line 1).

I'm running guardrails-ai==0.2.4 and was basing my attempts off of https://docs.guardrailsai.com/examples/provenance/#provenance-v1

The ProvenanceV1 validation works when I supply gpt-3.5-turbo as is the default. I can't immediately see what's different with the guardrail I'm building between the two - the openai_callable function defined in set_callable looks pretty identical. I can't find any examples of a custom callable besides.

While the OpenAI string model value works for me during testing I cannot use that for my application and need to use Google's instead, just to clarify!

Thanks for any help you can provide! I'll continue looking into the docs / code to figure out what's happening with the validator parsing

e-m-albright avatar Sep 28 '23 19:09 e-m-albright

Hi! Your assessment is correct, this is a bug in how the feature is documented vs how it works. With the current sequencing of guard runs, validator params are serialized/deserialized before they're executed. This is not loss-less when it tries to run on callables. As a result, callable-passing via parameters isn't available through validators even though the provenancev1 validator accepts a callable of that method signature.

The fix here would be to rewrite the provenancev1 validator to accept the llm_callable as metadata instead of as a param.

This hasn't currently been prioritized, but I'll try to see if we can get it released in 0.2.5 (sometime next week). I'll return with a more concrete ETA once we have it. I'll also tag this issue as a "Good first issue", I think it's something someone can get to with limited context.

zsimjee avatar Sep 28 '23 21:09 zsimjee

Awesome, thanks @zsimjee!

e-m-albright avatar Sep 29 '23 14:09 e-m-albright

This'll be solved as part of our move away from using XML serialization internally, within the next two weeks.

irgolic avatar Oct 04 '23 16:10 irgolic