pytest-regressions Support for "ignoring a list of specific fields"

I was using this tool to generate our regression baselines, the result is a JSON string like:

{
"id": "4163B50C4DD9911804894F827078B218",
"instanceId": "020B24C54D891A89B44AAABDBCFCE057",
 "status": 1
}

The problem is that there are some fields such as the instanceId field that is using UUID that will always change, and I don't want this false alert. One workaround would be:

We provide the ignores=["instanceId",] to the data_regression function, and the value of this field will be ignored when comparing the result.
Or, we can replace the value of specific field using special string, such as "{% ignore %}" or something else;

Aug 06 '20 01:08 hdw868

Hi @hdw868,

Couldn't you preprocess the data yourself before passing to the fixture? Btw, are you using data_regression or file_regression?

Aug 06 '20 02:08 nicoddemus

Hi @nicoddemus, Yes, I could process the data before passing to the fixture, but usually, those are very common fileds such as "id", ''creationDate" and etc, I think it would much more convenient to provide a list of keys to be ignored instead of calling those reprocessing funtion.

Aug 06 '20 08:08 hdw868

Hi @hdw868,

While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:

{
  "orders": [
     {
       "id": "aksjsnj-10291",
     }
  ],
  "product" : {
    "id": "215",
  },
}

If a user wants to remove the id key from orders but not from product, they will need to do it themselves anyway.

For the simple use case, the proposal to ignore just a top-level key, the code:

del data["id"]
data_regression(data)

Becomes:

data_regression(data, ignore_key=["id"])

Which is not a big win IMHO.

So particularly I'm 👎 on adding this, but let's hear what others think.

Aug 06 '20 11:08 nicoddemus

We had the same issue using data_regression for SQLAlchemy models.

Below is a recipe for a new fixture based on data_regression that would ignore some fields of the data given to the check method. The recipe uses regex patterns, but I think you can simplify it for your own purpose. You can write this code into your project conftest.py, update the class member fields_to_ignore and use the db_model_regression.check instead.

@pytest.fixture()
def db_model_regression(data_regression):
    return DbModelRegression(data_regression)


class DbModelRegression:

    fields_to_ignore = ['updated_on', 'created_on', r'\w*_id', '^id$']

    def __init__(self, data_regression):
        self.data_regression = data_regression

    def check(self, data, **kwargs):
        return self.data_regression.check(self._prepare_for_regression(data), **kwargs)

    def _should_record(self, field_name):
        for pattern in self.fields_to_ignore:
            if re.match(pattern, field_name):
                return False
        else:
            return True

    def _prepare_for_regression(self, data):
        if isinstance(data, list):
            prepared_data = []
            for item in data:
                prepared_data.append(self._prepare_dict(item))
            return prepared_data
        elif isinstance(data, dict):
            return self._prepare_dict(data)
        else:
            return data

    def _prepare_dict(self, data):
        prepared_data = {}
        for key in filter(self._should_record, data):
            value = data[key]
            if isinstance(value, list):
                prepared_list = []
                for item in value:
                    prepared_list.append(self._prepare_for_regression(item))
                prepared_data[key] = prepared_list
            elif isinstance(value, dict):
                prepared_data[key] = self._prepare_for_regression(value)
            else:
                prepared_data[key] = value
        return prepared_data

I agree with you that we must find a more elegant way of address that. But I don't think that adding an ignores option is the answer. Maybe adding some highlevel fixtures that knows how to deal with objects instead of dicts. I'll try to put some thought on it.

Aug 06 '20 12:08 igortg

Hi @hdw868,

While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:
{
  "orders": [
     {
       "id": "aksjsnj-10291",
     }
  ],
  "product" : {
    "id": "215",
  },
}
If a user wants to remove the id key from orders but not from product, they will need to do it themselves anyway.

For the simple use case, the proposal to ignore just a top-level key, the code:
del data["id"]
data_regression(data)
Becomes:
data_regression(data, ignore_key=["id"])
Which is not a big win IMHO.

So particularly I'm 👎 on adding this, but let's hear what others think.

Yes, provide a ignore option only may make things complicated anyway, however, the workaround provided by @igortg seems to have the same problem. Just wondering if there is a more elegant way to handle this situation.

Aug 07 '20 00:08 hdw868

A more elegant way would be building your own serializer to convert objects to dicts without the unwanted fields. Pydantic has something like that, but you have to specify the attribute you want to ignore also in the nested objects.

Aug 07 '20 20:08 igortg

Hmm, would adding exclude options like pydantic be useful? I may try to create a PR if it's worth trying.

Aug 10 '20 01:08 hdw868

As an idea, I think it would be nice to have a more generic solution that can handle not just ignoring specific fields, but also custom comparison behaviour.

For example, one may want to be able to

ignore the value of id but assert that id field is present
ignore the value but assert that it is a positive integer
assert that a float equals expected value with some tolerance
assert that a list contains expected items ignoring order
assert that a string matches a regex

One way to achive this could be by providing a way to override fields' values with any object that defines an __eq__ method. Something like

from unittest.mock import ANY
from pytest import approx

data = {
    "orders": [{
        "id": "aksjsnj-10291",
    }],
    "product" : {
        "id": "215",
    },
    "gravity": 9.81,
    # ...100500 other fields
}

data_regression.check(
    data,
    {'product': {'id': ANY}, 'gravity': approx(10, 0.1)}
)

Aug 12 '20 19:08 qweeze

One way to achive this could be by providing a way to override fields' values with any object that defines an eq method.

At first glance I like this idea, seems to include a nice general way to customize how things are compared.

Aug 13 '20 11:08 nicoddemus

I forked the library (https://github.com/adamzev/pytest-regressions) and implemented a solution that works for my use case. I focused just on data regression yml files. For lines you want to ignore you can add a # ignore comment on them in the yml file and it will ignore differences in those lines. So you don't have to manually enter what lines to ignore, I added a --force-ignore flag which will tag every line which is currently showing up as different (for me, running the tests in a different order produces different model ids which can be deeply nested keys or values) to ignore in the future.

If this would work for other people I'd be happy to look at what needs to be done to make this able to be brought in.

Nov 11 '21 20:11 adamzev

FWIW, here is a relevant discussion in a similar library https://github.com/syrusakbary/snapshottest/issues/21. I think pointing out that jest does this with their "property matchers" is nice https://jestjs.io/docs/en/snapshot-testing#property-matchers.

Oct 10 '22 15:10 danlamanna

pytest-regressions pytest-regressions copied to clipboard

Support for "ignoring a list of specific fields"

pytest-regressions
pytest-regressions copied to clipboard