evidently icon indicating copy to clipboard operation
evidently copied to clipboard

Feature to check variable bounds, not just min max but also multi bounds

Open VaishnaviMendhe opened this issue 1 year ago • 3 comments

Evidently does not provide a feature to check and compare the multiple bounds of numeric variables between the current data and reference data. for example test should be flagged true if the values of variable are within the bound [[5,10],[50,100]] and for values between 11 to 49 it should be flagged false.

It would be great f evidently can create a json itself for variables bounds in the data set.

Also ,If a column is entirely null in any dataset, Evidently fails to generate the report. Instead, it could ignore the column and issue a warning message if it's not present in any dataset.

VaishnaviMendhe avatar Sep 04 '23 12:09 VaishnaviMendhe

Hi @VaishnaviMendhe,

Thanks for sharing your thoughts!

1. Complex conditions for value range

There is indeed no special metric to set double conditions for the value range. As a workaround, you can test similar complex conditions using regular expressions. You can use a TestColumnRegExp test that accepts a regular expression via reg_exp parameter.

2. Creating a JSON with test conditions

If I understood you right, you are referring to the ability to:

  • export the test conditions that are automatically generated by Evidently (such as derived value ranges) as a separate JSON, and
  • for tests to accept the conditions expressed as a JSON (instead of passing the reference dataset / specifying the conditions as parameters)

This is a great feature indeed, and we plan to add it in the future. However, this is a significant change that requires reworking all metrics and tests, so it will take some time to implement.

3. Empty columns in the dataset

We did consider this - it might indeed be more convenient for ad hoc analysis when the results of Reports are reviewed manually. However, we believe an explicit error is a better universal outcome in this scenario - especially when Reports or Test Suites are generated as part of an automated pipeline, and passing an empty column might be unintentional.

elenasamuylova avatar Sep 06 '23 13:09 elenasamuylova

Empty columns in the dataset: So if I wish to create a report for live data for some time intervals and let's say for that specific interval if a col is empty, evidently totally fails generating the reports. this can be passed as a parameter, to include_empty_cols =True , and default this to False

VaishnaviMendhe avatar Sep 07 '23 04:09 VaishnaviMendhe

Thanks @VaishnaviMendhe - that's a good point, specifically for near real-time monitoring at short intervals. Cc @emeli-dral to review how we can address this better.

elenasamuylova avatar Sep 07 '23 17:09 elenasamuylova