pytest-regressions icon indicating copy to clipboard operation
pytest-regressions copied to clipboard

Support for "ignoring a list of specific fields"

Open hdw868 opened this issue 4 years ago • 11 comments

I was using this tool to generate our regression baselines, the result is a JSON string like:

{
"id": "4163B50C4DD9911804894F827078B218",
"instanceId": "020B24C54D891A89B44AAABDBCFCE057",
 "status": 1
}

The problem is that there are some fields such as the instanceId field that is using UUID that will always change, and I don't want this false alert. One workaround would be:

  • We provide the ignores=["instanceId",] to the data_regression function, and the value of this field will be ignored when comparing the result.

  • Or, we can replace the value of specific field using special string, such as "{% ignore %}" or something else;

hdw868 avatar Aug 06 '20 01:08 hdw868

Hi @hdw868,

Couldn't you preprocess the data yourself before passing to the fixture? Btw, are you using data_regression or file_regression?

nicoddemus avatar Aug 06 '20 02:08 nicoddemus

Hi @nicoddemus, Yes, I could process the data before passing to the fixture, but usually, those are very common fileds such as "id", ''creationDate" and etc, I think it would much more convenient to provide a list of keys to be ignored instead of calling those reprocessing funtion.

hdw868 avatar Aug 06 '20 08:08 hdw868

Hi @hdw868,

While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:

{
  "orders": [
     {
       "id": "aksjsnj-10291",
     }
  ],
  "product" : {
    "id": "215",
  },
}

If a user wants to remove the id key from orders but not from product, they will need to do it themselves anyway.

For the simple use case, the proposal to ignore just a top-level key, the code:

del data["id"]
data_regression(data)

Becomes:

data_regression(data, ignore_key=["id"])

Which is not a big win IMHO.

So particularly I'm 👎 on adding this, but let's hear what others think.

nicoddemus avatar Aug 06 '20 11:08 nicoddemus

We had the same issue using data_regression for SQLAlchemy models.

Below is a recipe for a new fixture based on data_regression that would ignore some fields of the data given to the check method. The recipe uses regex patterns, but I think you can simplify it for your own purpose. You can write this code into your project conftest.py, update the class member fields_to_ignore and use the db_model_regression.check instead.

@pytest.fixture()
def db_model_regression(data_regression):
    return DbModelRegression(data_regression)


class DbModelRegression:

    fields_to_ignore = ['updated_on', 'created_on', r'\w*_id', '^id$']

    def __init__(self, data_regression):
        self.data_regression = data_regression

    def check(self, data, **kwargs):
        return self.data_regression.check(self._prepare_for_regression(data), **kwargs)

    def _should_record(self, field_name):
        for pattern in self.fields_to_ignore:
            if re.match(pattern, field_name):
                return False
        else:
            return True

    def _prepare_for_regression(self, data):
        if isinstance(data, list):
            prepared_data = []
            for item in data:
                prepared_data.append(self._prepare_dict(item))
            return prepared_data
        elif isinstance(data, dict):
            return self._prepare_dict(data)
        else:
            return data

    def _prepare_dict(self, data):
        prepared_data = {}
        for key in filter(self._should_record, data):
            value = data[key]
            if isinstance(value, list):
                prepared_list = []
                for item in value:
                    prepared_list.append(self._prepare_for_regression(item))
                prepared_data[key] = prepared_list
            elif isinstance(value, dict):
                prepared_data[key] = self._prepare_for_regression(value)
            else:
                prepared_data[key] = value
        return prepared_data

I agree with you that we must find a more elegant way of address that. But I don't think that adding an ignores option is the answer. Maybe adding some highlevel fixtures that knows how to deal with objects instead of dicts. I'll try to put some thought on it.

igortg avatar Aug 06 '20 12:08 igortg

Hi @hdw868,

While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:

{
  "orders": [
     {
       "id": "aksjsnj-10291",
     }
  ],
  "product" : {
    "id": "215",
  },
}

If a user wants to remove the id key from orders but not from product, they will need to do it themselves anyway.

For the simple use case, the proposal to ignore just a top-level key, the code:

del data["id"]
data_regression(data)

Becomes:

data_regression(data, ignore_key=["id"])

Which is not a big win IMHO.

So particularly I'm 👎 on adding this, but let's hear what others think.

Yes, provide a ignore option only may make things complicated anyway, however, the workaround provided by @igortg seems to have the same problem. Just wondering if there is a more elegant way to handle this situation.

hdw868 avatar Aug 07 '20 00:08 hdw868

A more elegant way would be building your own serializer to convert objects to dicts without the unwanted fields. Pydantic has something like that, but you have to specify the attribute you want to ignore also in the nested objects.

igortg avatar Aug 07 '20 20:08 igortg

Hmm, would adding exclude options like pydantic be useful? I may try to create a PR if it's worth trying.

hdw868 avatar Aug 10 '20 01:08 hdw868

As an idea, I think it would be nice to have a more generic solution that can handle not just ignoring specific fields, but also custom comparison behaviour.

For example, one may want to be able to

  • ignore the value of id but assert that id field is present
  • ignore the value but assert that it is a positive integer
  • assert that a float equals expected value with some tolerance
  • assert that a list contains expected items ignoring order
  • assert that a string matches a regex

One way to achive this could be by providing a way to override fields' values with any object that defines an __eq__ method. Something like

from unittest.mock import ANY
from pytest import approx

data = {
    "orders": [{
        "id": "aksjsnj-10291",
    }],
    "product" : {
        "id": "215",
    },
    "gravity": 9.81,
    # ...100500 other fields
}

data_regression.check(
    data,
    {'product': {'id': ANY}, 'gravity': approx(10, 0.1)}
)   

qweeze avatar Aug 12 '20 19:08 qweeze

One way to achive this could be by providing a way to override fields' values with any object that defines an eq method.

At first glance I like this idea, seems to include a nice general way to customize how things are compared.

nicoddemus avatar Aug 13 '20 11:08 nicoddemus

I forked the library (https://github.com/adamzev/pytest-regressions) and implemented a solution that works for my use case. I focused just on data regression yml files. For lines you want to ignore you can add a # ignore comment on them in the yml file and it will ignore differences in those lines. So you don't have to manually enter what lines to ignore, I added a --force-ignore flag which will tag every line which is currently showing up as different (for me, running the tests in a different order produces different model ids which can be deeply nested keys or values) to ignore in the future.

If this would work for other people I'd be happy to look at what needs to be done to make this able to be brought in.

adamzev avatar Nov 11 '21 20:11 adamzev

FWIW, here is a relevant discussion in a similar library https://github.com/syrusakbary/snapshottest/issues/21. I think pointing out that jest does this with their "property matchers" is nice https://jestjs.io/docs/en/snapshot-testing#property-matchers.

danlamanna avatar Oct 10 '22 15:10 danlamanna