evidently
evidently copied to clipboard
The default behavior using a reference in TestShareOfOutRangeValues
Hello, the issue is: In the documentation (All Tests), the following test: TestShareOfOutRangeValues have as the default value this configuration -
With reference: the test fails if over 10% of values are out of range.
However when we perform the Test:
data_quality = TestSuite(tests=[
TestShareOfOutRangeValues(column_name='HouseAge')
])
data_quality.run(reference_data=ref,current_data=curr,column_mapping=schema)
data_quality.as_dict()
The result is:
'version': '0.1.58.dev0',
'datetime': '2022-10-27T09:57:01.408264',
'tests': [{'name': 'Share of Out-of-Range Values',
'description': 'The share of values out of range in the column **HouseAge** is 0.0002 (1 out of 5000). The test threshold is eq=0 ± 1e-12.',
'status': 'FAIL',
The test is not using the Reference as a value, indeed the Condition is eq=0, as showed by in the source code: TestShare Source Code
class TestShareOfOutRangeValues(BaseDataQualityValueRangeMetricsTest):
name = "Share of Out-of-Range Values"
def get_condition(self) -> TestValueCondition:
if self.condition.has_condition():
return self.condition
return TestValueCondition(eq=approx(0))
Thanks a lot for raising the issue @samuelamico! It is a mistake in the documentation.
By default, the test does use the reference (to learn the reference value ranges) but expects all values in the current data to stay in this range. I added a PR to update the docs to match the current implementation: https://github.com/evidentlyai/evidently/pull/425
Hi, do you plan to make the margin configurable? I.e.
the test fails if over XX% of values are out of range
?
Thanks
Hi @anh-le-profinit,
It is possible to configure custom conditions for all tests. Here is the documentation: https://docs.evidentlyai.com/user-guide/tests-and-reports/custom-test-suite#3.-set-test-conditions
For example, if you want the test to fail if more than 10% of values in the column "age" are out of range (with the range derived automatically from the reference dataset):
my_tests = TestSuite(tests=[
TestShareOfOutRangeValues(column_name='age', lt=0.1),
])
If you also want to set a manual range of the feature value (for example, from 10 to 80):
my_tests = TestSuite(tests=[
TestShareOfOutRangeValues(column_name='age', left=10, right=80, lt=0.1),
])
Thanks Elena,
my mistake, my question was regarding similar, but slightly different tests - TestColumnShareOfMissingValues
and TestMostCommonValueShare
.
There, a reference dataset is used to set a reference metric and the tests check whether the current metric is within a certain range. The range is now fixed at 10% around the reference, which for low reference values can be very strict. Is there a way to relax this constraint (or do you plan to introduce it in the future?)
Hi @anh-le-profinit,
It works exactly the same - you can pass custom conditions to any Evidently Test.
For example, if you want the test to fail if share of missing values is >= 20%, here is how you do that.
my_tests = TestSuite(tests=[
TestColumnShareOfMissingValues(column_name='age', lt=0.2),
])
Here are the docs on standard parameters you can use to set test conditons (lt
, gt
, eq
, etc.):
https://docs.evidentlyai.com/user-guide/tests-and-reports/custom-test-suite#3.-set-test-conditions
In this case you will set the Test condition without comparing it to the reference - the Test will simply check if the total share of missing values in the current dataset is over 20%.
It is not currently possible to set a different condition relative to the reference automatically. If you want to set a condition as +/-20% from reference, you need to first derive the share of missing values in your reference dataset, and then use approx
(explained here: https://docs.evidentlyai.com/user-guide/tests-and-reports/custom-test-suite#custom-conditions-with-approx). Here is how you set the boundary as 5 +/-20%:
lt=approx(5, relative=0.2)
We plan to add the ability to set the condition relative to the reference in the future.
Great, this answers my question :)
We plan to add the ability to set the condition relative to the reference in the future.>
Looking forward to that moment. Thanks for all the clarification