pytest-regressions
pytest-regressions copied to clipboard
Support for "ignoring a list of specific fields"
I was using this tool to generate our regression baselines, the result is a JSON string like:
{
"id": "4163B50C4DD9911804894F827078B218",
"instanceId": "020B24C54D891A89B44AAABDBCFCE057",
"status": 1
}
The problem is that there are some fields such as the instanceId field that is using UUID that will always change, and I don't want this false alert. One workaround would be:
-
We provide the
ignores=["instanceId",]
to the data_regression function, and the value of this field will be ignored when comparing the result. -
Or, we can replace the value of specific field using special string, such as "{% ignore %}" or something else;
Hi @hdw868,
Couldn't you preprocess the data yourself before passing to the fixture? Btw, are you using data_regression
or file_regression
?
Hi @nicoddemus, Yes, I could process the data before passing to the fixture, but usually, those are very common fileds such as "id", ''creationDate" and etc, I think it would much more convenient to provide a list of keys to be ignored instead of calling those reprocessing funtion.
Hi @hdw868,
While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:
{
"orders": [
{
"id": "aksjsnj-10291",
}
],
"product" : {
"id": "215",
},
}
If a user wants to remove the id
key from orders
but not from product
, they will need to do it themselves anyway.
For the simple use case, the proposal to ignore just a top-level key, the code:
del data["id"]
data_regression(data)
Becomes:
data_regression(data, ignore_key=["id"])
Which is not a big win IMHO.
So particularly I'm 👎 on adding this, but let's hear what others think.
We had the same issue using data_regression
for SQLAlchemy models.
Below is a recipe for a new fixture based on data_regression
that would ignore some fields of the data given to the check
method. The recipe uses regex patterns, but I think you can simplify it for your own purpose. You can write this code into your project conftest.py
, update the class member fields_to_ignore
and use the db_model_regression.check
instead.
@pytest.fixture()
def db_model_regression(data_regression):
return DbModelRegression(data_regression)
class DbModelRegression:
fields_to_ignore = ['updated_on', 'created_on', r'\w*_id', '^id$']
def __init__(self, data_regression):
self.data_regression = data_regression
def check(self, data, **kwargs):
return self.data_regression.check(self._prepare_for_regression(data), **kwargs)
def _should_record(self, field_name):
for pattern in self.fields_to_ignore:
if re.match(pattern, field_name):
return False
else:
return True
def _prepare_for_regression(self, data):
if isinstance(data, list):
prepared_data = []
for item in data:
prepared_data.append(self._prepare_dict(item))
return prepared_data
elif isinstance(data, dict):
return self._prepare_dict(data)
else:
return data
def _prepare_dict(self, data):
prepared_data = {}
for key in filter(self._should_record, data):
value = data[key]
if isinstance(value, list):
prepared_list = []
for item in value:
prepared_list.append(self._prepare_for_regression(item))
prepared_data[key] = prepared_list
elif isinstance(value, dict):
prepared_data[key] = self._prepare_for_regression(value)
else:
prepared_data[key] = value
return prepared_data
I agree with you that we must find a more elegant way of address that. But I don't think that adding an ignores
option is the answer. Maybe adding some highlevel fixtures that knows how to deal with objects instead of dicts. I'll try to put some thought on it.
Hi @hdw868,
While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:
{ "orders": [ { "id": "aksjsnj-10291", } ], "product" : { "id": "215", }, }
If a user wants to remove the
id
key fromorders
but not fromproduct
, they will need to do it themselves anyway.For the simple use case, the proposal to ignore just a top-level key, the code:
del data["id"] data_regression(data)
Becomes:
data_regression(data, ignore_key=["id"])
Which is not a big win IMHO.
So particularly I'm 👎 on adding this, but let's hear what others think.
Yes, provide a ignore option only may make things complicated anyway, however, the workaround provided by @igortg seems to have the same problem. Just wondering if there is a more elegant way to handle this situation.
A more elegant way would be building your own serializer to convert objects to dicts without the unwanted fields. Pydantic has something like that, but you have to specify the attribute you want to ignore also in the nested objects.
Hmm, would adding exclude options like pydantic be useful? I may try to create a PR if it's worth trying.
As an idea, I think it would be nice to have a more generic solution that can handle not just ignoring specific fields, but also custom comparison behaviour.
For example, one may want to be able to
- ignore the value of
id
but assert thatid
field is present - ignore the value but assert that it is a positive integer
- assert that a float equals expected value with some tolerance
- assert that a list contains expected items ignoring order
- assert that a string matches a regex
One way to achive this could be by providing a way to override fields' values with any object that defines an __eq__
method. Something like
from unittest.mock import ANY
from pytest import approx
data = {
"orders": [{
"id": "aksjsnj-10291",
}],
"product" : {
"id": "215",
},
"gravity": 9.81,
# ...100500 other fields
}
data_regression.check(
data,
{'product': {'id': ANY}, 'gravity': approx(10, 0.1)}
)
One way to achive this could be by providing a way to override fields' values with any object that defines an eq method.
At first glance I like this idea, seems to include a nice general way to customize how things are compared.
I forked the library (https://github.com/adamzev/pytest-regressions) and implemented a solution that works for my use case. I focused just on data regression yml files. For lines you want to ignore you can add a # ignore
comment on them in the yml file and it will ignore differences in those lines. So you don't have to manually enter what lines to ignore, I added a --force-ignore
flag which will tag every line which is currently showing up as different (for me, running the tests in a different order produces different model ids which can be deeply nested keys or values) to ignore in the future.
If this would work for other people I'd be happy to look at what needs to be done to make this able to be brought in.
FWIW, here is a relevant discussion in a similar library https://github.com/syrusakbary/snapshottest/issues/21. I think pointing out that jest does this with their "property matchers" is nice https://jestjs.io/docs/en/snapshot-testing#property-matchers.