autolabel
autolabel copied to clipboard
Validate the labeling config being sent to the Labeling Agent
Currently, the labeling config sent to the Labeling Agent can contain typos in the passed in keys which would lead to the default value being used without informing the user.
We should validate all the keys being passed in the config, and check if they are actual keys. Any unused keys should be flagged by the labeling agent.
For eg :- If there is a typo while filling in the explanation_column, and we call it "expanatn_colums", we should get an error from the labeling agnet saying key not found
@rajasbansal Do you have a schema validation tool in mind or Pydantic
works?
@nihit - any thoughts on a schema validation library to use? @Sardhendu is interested in taking on this issue.
Either jsonschema
validator or other works. I like pydantic because its pythonic and due to its data model syntax. Additionally post and pre init checks a good to have.
Others I found are Cerberus
and Voluptuous
but never used them.
Hey @Sardhendu, appreciate you taking this on! super important to improve usability and reliability of the library.
Currently the base config class has a dummy _validate
function here: https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/configs/base.py#L30, that we'll need to override in the downstream class (https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/configs/config.py).
We can use jsonschema
validation for this - define the expected schema for the config, and use something like https://python-jsonschema.readthedocs.io/en/latest/validate/ for validation at runtime when the config object is passed in?
@nihit Sure let me take a look.