evidently
evidently copied to clipboard
Feature to check variable bounds, not just min max but also multi bounds
Evidently does not provide a feature to check and compare the multiple bounds of numeric variables between the current data and reference data. for example test should be flagged true if the values of variable are within the bound [[5,10],[50,100]] and for values between 11 to 49 it should be flagged false.
It would be great f evidently can create a json itself for variables bounds in the data set.
Also ,If a column is entirely null in any dataset, Evidently fails to generate the report. Instead, it could ignore the column and issue a warning message if it's not present in any dataset.
Hi @VaishnaviMendhe,
Thanks for sharing your thoughts!
1. Complex conditions for value range
There is indeed no special metric to set double conditions for the value range. As a workaround, you can test similar complex conditions using regular expressions. You can use a TestColumnRegExp
test that accepts a regular expression via reg_exp
parameter.
2. Creating a JSON with test conditions
If I understood you right, you are referring to the ability to:
- export the test conditions that are automatically generated by Evidently (such as derived value ranges) as a separate JSON, and
- for tests to accept the conditions expressed as a JSON (instead of passing the reference dataset / specifying the conditions as parameters)
This is a great feature indeed, and we plan to add it in the future. However, this is a significant change that requires reworking all metrics and tests, so it will take some time to implement.
3. Empty columns in the dataset
We did consider this - it might indeed be more convenient for ad hoc analysis when the results of Reports are reviewed manually. However, we believe an explicit error is a better universal outcome in this scenario - especially when Reports or Test Suites are generated as part of an automated pipeline, and passing an empty column might be unintentional.
Empty columns in the dataset: So if I wish to create a report for live data for some time intervals and let's say for that specific interval if a col is empty, evidently totally fails generating the reports. this can be passed as a parameter, to include_empty_cols =True , and default this to False
Thanks @VaishnaviMendhe - that's a good point, specifically for near real-time monitoring at short intervals. Cc @emeli-dral to review how we can address this better.