guardrails icon indicating copy to clipboard operation
guardrails copied to clipboard

[feat] Dynamic Arguments for validators - reference multiple fields from RAIL or output in a validator

Open abhishek203 opened this issue 1 year ago • 4 comments

Description All the values generated by the LLM need to be consistent with respect to other values in the output.

Why is this needed I am extracting the information present in a resume to JSON. The pydantic class is

class CompanyInfo(BaseModel):
    name: str = Field(description = "Name of the company the applicant has worked with")
    years: int = Field(validators = [ValidRange(min=0,max=50,on_fail='fix')], description = "Number of years the applicant has worked in that particular company" )

class ApplicantInfo(BaseModel):
    name: str = Field(description = "Name of the applicant")
    univ: str = Field(description = "Name of the university the applicant went to")
    experience: int = Field(validators = [ValidRange(min=0,max=50,on_fail='fix')], description = "Total professional experience in years")
    experience_list: List[CompanyInfo] = Field(description = "List of companies the applicant has worked before")
    database_experience: int = Field(validators = [ValidRange(min=0,max=50,on_fail='fix')], description = "Total experience in database systems in years")
    python_experience: int = Field(validators = [ValidRange(min=0,max=50,on_fail='fix')], description = "Total python experience in years")

Output:

{
  "name": "E Abhishek",
  "univ": "Indian Institute of Technology Bombay",
  "experience": 3,
  "experience_list": [
    {
      "name": "Enphase Energy",
      "years": 1
    },
    {
      "name": "Warner Bros. Discovery",
      "years": 2
    }
  ],
  "database_experience": 1,
  "python_experience": 4
}

The value of "experience", i.e., total experience should equal the sum of "years" in "experience_list". The value of "database_experience" and "python_experience" should be less than or equal to "experience". I do not see a way to validate these with the current validators.

Implementation details The functions in the guard class should accept a user-defined function as input that checks consistency across the output data.

guard = gd.Guard.from_pydantic(output_class=ApplicantInfo, prompt=prompt, validity_check_fn=validity_check_fn)

This will be passed to the Runner class where we should define another step for consistency checking in def step.

              # 3.1. Run output validation using the consistency checking function
                validated_fragment = self.validate_fn(
                    iteration,
                    index,
                    parsed_fragment,
                    validity_check_fn,
                )
              Raise error if the fragment is not valid

The specification for validity_check_fn is that it should accept the JSON result as input and return True or False. The validate_fn returns PassResult if validity_check_fn return True.

End result This feature should be used when there is a requirement for consistent data.

abhishek203 avatar Jan 02 '24 17:01 abhishek203

Hello @abhishek203 , this is actually a great enhancement suggestion! Let me get back to you whether this can be done out of the box today, or is this something we can potentially add as a feature in future releases.

thekaranacharya avatar Jan 03 '24 16:01 thekaranacharya

Hi Abhishek!

Yeah I agree with Karan, this is a really interesting ask. At it's essence, the addition to Guardrails would be beyond a validator as it exists. We would need to add the ability to write a validator for one field that references multiple other fields. In your example, the validator you're asking for could have a header that looks something like

def validate(value, referenced_fields):
   '''
   referenced_fields is a dict of str: number. The str is the field id, the num is the value of that field
   '''
   ...

To orchestrate that is a large task because it requires new interfaces and plumbing to pass those populated references around. There is an existing way to do this with pydantic, and we could likely follow the same syntax, but we need to also translate that over to RAIL. We'll look into if this works with our current pydantic integration.

Bigger picture, this becomes easier if we split up validation into multiple, more atomic steps. i.e. being able to write a guardrail as a pipeline that looks like

input validate -> prompt llm -> verify schema -> verify fields -> further field verification

With hooks at differnet stages that allow for throw, reask, etc.

zsimjee avatar Jan 09 '24 18:01 zsimjee

So pydantic has this

https://docs.pydantic.dev/2.5/concepts/validators/#model-validators

It looks like it would fix the problem, but I'm not sure that it works with the guardrails pydantic integration.

The pydantic syntax is pretty great for python, and it makes a lot of sense to port into this library. I also think that it gives us a good framework for thinking about how this works in RAIL (i.e. declaratively) as an extra step that exists as a validator on the output tag as opposed to an internal object within the output tag.

Thinking through this, in your use case, I think we should be able to get this consistency validator working if

  1. You create a third pydantic model that wraps ApplicantInfo
  2. register a validator that you build by inheriting BaseValidator. That validator will have access to all fields and will be run after all the validations on the internal types complete since our validation runs (inside-out)
@register_validator
class ValidateForExperienceConsistency(BaseValidator):
  def validate(...):
    ...

class ApplicantInfoWrapper(BaseModel):
  applicant_summary: ApplicantInfo = Field(validators=[ValidateForExperienceConsistency()])

zsimjee avatar Jan 09 '24 18:01 zsimjee

@zsimjee @thekaranacharya

Approach 1: Building a wrapper around my class

@register_validator(name = "consistencycheck",data_type="ApplicantInfo")
class ValidateForExperienceConsistency(Validator):
   def __init__(
        self,
        on_fail: Optional[Callable] = None,
    ):
        super().__init__(on_fail=on_fail)
   def validate(self,value: Any,metadata: Dict):
    if value.experience < value.python_experience:
      return FailResult()
    return PassResult()

Output

ValueError: Data type ApplicantInfo is not registered.

We cannot do it because the data_type needs to be registered.

Approach 2: Using model_validator from pydantic

class ApplicantInfo(BaseModel):
    name: str = Field(description = "Name of the applicant")
    univ: str = Field(description = "Name of the university the applicant went to")
    experience: int = Field(validators = [ValidRange(min=0,max=50,on_fail='fix')], description = "Total professional experience in years")
    experience_list: List[CompanyInfo] = Field(description = "List of companies the applicant has worked before")
    database_experience: int = Field(validators = [ValidRange(min=0,max=50,on_fail='fix')], description = "Total experience database systems in years")
    python_experience: int = Field(validators = [ValidRange(min=0,max=50,on_fail='fix')], description = "Total python experience in years")

    @model_validator(mode='after')
    def validate_experience(self):
      if self.experience < self.python_experience:
        raise ValueError('exp mismatch')
      return self

Using the above class does not give any error but the results are still not consistent.

I believe existing tools are not sufficient to solve this problem, we might have to change stuff in guardrails. If the team is planning to take it up, I will be happy to contribute.

abhishek203 avatar Jan 15 '24 14:01 abhishek203

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Aug 22 '24 03:08 github-actions[bot]

This issue was closed because it has been stalled for 14 days with no activity.

github-actions[bot] avatar Sep 05 '24 03:09 github-actions[bot]