autolabel
autolabel copied to clipboard
Validate the labeling config being sent to the Labeling Agent
Currently, the labeling config sent to the Labeling Agent can contain typos in the passed in keys which would lead to the default value being used without informing the user.
We should validate all the keys being passed in the config, and check if they are actual keys. Any unused keys should be flagged by the labeling agent.
For eg :- If there is a typo while filling in the explanation_column, and we call it "expanatn_colums", we should get an error from the labeling agnet saying key not found
@rajasbansal Do you have a schema validation tool in mind or Pydantic works?
@nihit - any thoughts on a schema validation library to use? @Sardhendu is interested in taking on this issue.
Either jsonschema validator or other works. I like pydantic because its pythonic and due to its data model syntax. Additionally post and pre init checks a good to have.
Others I found are Cerberus and Voluptuous but never used them.
Hey @Sardhendu, appreciate you taking this on! super important to improve usability and reliability of the library.
Currently the base config class has a dummy _validate function here: https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/configs/base.py#L30, that we'll need to override in the downstream class (https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/configs/config.py).
We can use jsonschema validation for this - define the expected schema for the config, and use something like https://python-jsonschema.readthedocs.io/en/latest/validate/ for validation at runtime when the config object is passed in?
@nihit Sure let me take a look.