autolabel Validate the labeling config being sent to the Labeling Agent

Validate the labeling config being sent to the Labeling Agent

Open rajasbansal opened this issue 2 years ago • 2 comments

Currently, the labeling config sent to the Labeling Agent can contain typos in the passed in keys which would lead to the default value being used without informing the user.

We should validate all the keys being passed in the config, and check if they are actual keys. Any unused keys should be flagged by the labeling agent.

For eg :- If there is a typo while filling in the explanation_column, and we call it "expanatn_colums", we should get an error from the labeling agnet saying key not found

Jun 02 '23 22:06 rajasbansal

@rajasbansal Do you have a schema validation tool in mind or Pydantic works?

Jun 21 '23 23:06 Sardhendu

@nihit - any thoughts on a schema validation library to use? @Sardhendu is interested in taking on this issue.

Jun 22 '23 17:06 rishabh-bhargava

Either jsonschema validator or other works. I like pydantic because its pythonic and due to its data model syntax. Additionally post and pre init checks a good to have.

Others I found are Cerberus and Voluptuous but never used them.

Jun 22 '23 18:06 Sardhendu

Hey @Sardhendu, appreciate you taking this on! super important to improve usability and reliability of the library.

Currently the base config class has a dummy _validate function here: https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/configs/base.py#L30, that we'll need to override in the downstream class (https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/configs/config.py).

We can use jsonschema validation for this - define the expected schema for the config, and use something like https://python-jsonschema.readthedocs.io/en/latest/validate/ for validation at runtime when the config object is passed in?

Jun 22 '23 22:06 nihit

@nihit Sure let me take a look.

Jun 23 '23 00:06 Sardhendu

autolabel autolabel copied to clipboard

Validate the labeling config being sent to the Labeling Agent

autolabel
autolabel copied to clipboard