pollyjs icon indicating copy to clipboard operation
pollyjs copied to clipboard

Need to regenerate records if `matchRequestsBy` changes

Open skimi opened this issue 5 years ago • 13 comments

Description

If I understand things correctly, having the id hardcoded in the har files makes it so that when matchRequestBy changes no entry is valid. The recordings are in fact still viable but since the id is generated based on matchRequestBy all recordings need to be recreated.

That becomes a big problem when you have a lot of recordings, even the slightest change to matchRequestBy requires to regenerate everything!

I use matchRequestBy to ignore things that are dynamic or may change between two runs. Dates, tokens, some ids, etc... The config is mostly very static but in the case where it needs to change it creates a massive break.

What I would like to see is har files be a pure raw recording of the requests and be independant of the matchRequestBy.

Shareable Source

All entries in har files have this:

{
  "_id": "1f62d02515e4ee31cf40fafc4d01f127",
  "request": ...
}

skimi avatar Mar 11 '19 13:03 skimi

The request id that is associated with each HAR entry is required to determine which entry belongs to which request as multiple configuration options can dictate how they are matched.

If I understand things correctly, having the id hardcoded in the har files makes it so that when matchRequestBy changes no entry is valid.

The ID is an md5 hash of all the values that you've opted into via the matchRequestsBy configuration options. If a value changes (e.g. a query param or a header) then the id would change accordingly.

That becomes a big problem when you have a lot of recordings, even the slightest change to matchRequestBy requires to regenerate everything!

Polly is smart enough to know which requests are newly recorded and is able to merge them into a pre existing HAR file as well as clean up any requests that were no longer used within the lifecycle of the instance. A change of the config doesnt necessary mean that it will match a request to a new ID.

Can you provide an example of the config you're using and how you're using it? What sort of data are you trying to exclude and why does it need you to change your configuration dynamically?

offirgolan avatar Mar 11 '19 17:03 offirgolan

The config looks like this.

It's written as to ignore user specific params in the request like the authorization and restaurantId. Pollyjs allows me to completely ignore what user (or restaurant in my case) you are using to run your test. So dev-A might be using the user1 in his test config and dev-B might be using user2 but since pollyjs ignores those things with matchRequestsBy anything recorded by dev-B will still work for dev-A and vice versa.

export default {
  persister: 'rest',
  logging: true,
  recordIfMissing: false,
  recordFailedRequests: true,
  timing: Timing.fixed(0),
  matchRequestsBy: {
    headers: ignoreAuthorizationHeader,
    body: ignoreRestaurantIdInGraphQl,
    url: {
      protocol: false,
      hostname: false,
      pathname: ignoreRestaurantId,
    },
  },
};

The ID is an md5 hash of all the values that you've opted into via the matchRequestsBy configuration options. If a value changes (e.g. a query param or a header) then the id would change accordingly.

It changes because it's a new record no? No entry matched the new id and the record was regenerated. In my case I don't use recordIfMissing because it's too unpredictable. Given 10 requests, if the 5th one needs to be re-recorded, chances are it won't work because it would be made with an outdated Authorization token on a backend that might not be setup anymore for the test. So it would likely fail. Instead if I need to re-record something I switch polly to record mode and re-record the test from the start.

Anyway, the har file itself seems valid to me. The request has not changed, just my matchRequestBy config. So it's weird that I need to hit my backend again while I already have the request recorded.

skimi avatar Mar 12 '19 08:03 skimi

So dev-A might be using the user1 in his test config and dev-B might be using user2 but since pollyjs ignores those things with matchRequestsBy anything recorded by dev-B will still work for dev-A and vice versa.

I would recommend normalizing these data points instead of just ignoring them. Doing so should create a much more stable test environment and would reduce the chances of generating different recorded datasets.

// Replace all authorization token headers with a test user one
server.any().on('request', req => {
  if (req.hasHeader('Authorization')) {
    req.setHeader('Authorization', TEST_USER_TOKEN);
  }
});

// Replace all graphql calls with a restaurantId in the variables to use a test restaurant id
server
  .post('/graphql')
  .filter(req => 'restaurantId' in req.jsonBody().variables)
  .on('request', req => {
    const body = req.jsonBody();

    body.variables.restaurantId = TEST_RESTAURANT_ID;

    req.json(body);
  });

So it's weird that I need to hit my backend again while I already have the request recorded.

After reading this, I now understand what your original issue is. Unfortunately, there is no way to really get around this as the ID acts as a unique identifier to the actual request since there are so many possible ways to match a request to the parameters specified in the config. If the ID changes, we have no way of knowing if the previous recorded data is still valid so the safest assumption is to fetch a new one that will then truly match the ID.

I would love to understand your use case more as I'm still uncertain as to why your matchRequestsBy config is changing so much that new IDs need to be generated. Can you provide some more examples?

offirgolan avatar Mar 12 '19 22:03 offirgolan

The matchRequestsBy isn't changing that much but it can. Sometimes.

In this case, we've added graphql endpoints so I had to add the line body: ignoreRestaurantIdInGraphQl in the matchRequestBy to keep ignoring the restaurant. It does exactly what you laid out in your snippet and removes a variable. That is rare but just changing that parameter meant that all previously recorded calls to REST api also became invalid.

Couldn't the id be generated when pollyjs starts instead of hard coding it when saving the har file?


Anyway, I'll layout how I use pollyjs and why I chose to ignore session specific variables in the requests.

We use Pollyjs within Cypress tests. Before pollyjs we were mocking all requests by hand with some other libs. We migrated to polly because it allows us to not maintain the mocks at all 👍. Instead we actually request the API the first time the test is run and then pollyjs saves and mocks everything. No more mocks to maintain 🎉.

BUT we need to actually have a test user existing on the backend and we need multiple devs to be able to collaborate on the same backend. They can't be using the same user to write tests without risking conflicts.

Our problem is that we can't pop one backend per developer.

So we allow each dev to setup their own user and matchRequestBy is used to ignore that parameter so devA records work for devB.

skimi avatar Mar 13 '19 10:03 skimi

Couldn't the id be generated when pollyjs starts instead of hard coding it when saving the har file?

They already are. The ID is generated every time a request goes through Polly and is stored with the recording. When a new request comes in, Polly generates an ID for it and sees if it matches any existing one that was recorded.

They can't be using the same user to write tests without risking conflicts.

Is this because the test cases are mutating the database on each test run without doing any cleanup to restore it to its previous state? Or is this because you have test cases running in parallel that could somehow conflict?

offirgolan avatar Apr 04 '19 22:04 offirgolan

They already are. The ID is generated every time a request goes through Polly and is stored with the recording.

What I mean by when pollyjs starts is when the persister is instantiated. At that time it could generate the id for all its records (might be long, don't know). That way the id would always be generated with the latest matchRequestsBy config.

If the ID changes, we have no way of knowing if the previous recorded data is still valid so the safest assumption is to fetch a new one that will then truly match the ID.

Since the har file is a raw recording of the request it would still have all the necessary information needed to generate a new ID that would match the new matchRequestsBy setting. Instead of fetching again regenerate the IDs for the existing recordings. Then maybe fetch if it still does not match.

Anyway, I know it's an edge case. My first assumption that any changes to matchRequestBy require a record of everything is actually false. Requests that are not affected by the change won't actually need to be re-fetched. I just had a case where a single change affected everything.

If possible, maybe pollyjs could not hardcode the id in the har-file. That would avoid a few fetch sometimes

skimi avatar May 03 '19 08:05 skimi

FWIW I am just starting to use polly for a workflow of:

  1. Record with a secret token (to get back valid responses from vendor that requires auth),
  2. take the secret token out of the HAR (ideally via #251 which I just filed),
  3. update the config to matchRequestsBy-out any token (either real or fake) (ideally I would have done this first, but pragmatically the easiest way to know where the token is is to just do a recording and then go look for it, and then update matchRequestsBy to ignore it),
  4. use a fake token in CI that uses the recorded vendor response even though we're using a different token.

I eventually got it to work, but several times I wanted to just manually massage the HAR file entries (take the token out of the body, take out the user agent header), but got thrown off b/c my changing their contents didn't re-derive the _id (so my updated-recording would have matched my updated-actual, but it didn't because of the stale _id, so I was forced to re-record).

Personally my workflow would have been much smoother if Polly just re-derived the _id for each HAR file entry (based on the latest matchRequestsBy) instead of using a stale value.

I'm also not generally following why the persisting the GUID is necessary at all (although I definitely get calc'ing it makes matching easier vs. doing "scan all recordings" on each incoming request), i.e. if I think / AFAIU of the fundamental matching that polly is doing, it's:

hash(matchRequestsBy(just-seen-request)) == hash(matchRequestsBy(persisted-request))

...so, if I were to ever change persisted-request (by hand-editing the HAR file) or change matchRequestsBy (by changing my config), when would I ever want that old id value to stay the same?

Isn't that old id defacto worthless because for: a) hand-editing the HAR file, the id no longer matches what I intuitively assert (by reading the rest of the HAR file) think my request is matching again, or b) for changing the matchRequestsBy, it will also change the left-side just-seen-request hash. In either scenario, the left-side id either changed or, or my right-side id is now stale/wrong.

So, basically, I can't see when keeping the old _id would be useful, so why not just use the latest /calc'd hash(matchRequestsBy(persisted-request)) value for each entry in the HAR file?

stephenh avatar Sep 13 '19 19:09 stephenh

@stephenh _id could be computed at runtime it's just not the implementation today.

If there is enough interest, I think we could make the necessary changes but the work involved to get there is pretty significant.

jasonmit avatar Sep 18 '19 00:09 jasonmit

@jasonmit If not regenerating the ids at startup maybe regenerating the ids on demand via a cli command I had to edit some har files manually like @stephenh and I ended up hacking away at bits of polly code to create a script regenerating the ids. So it's possible but none of the parts needed are currently exported by polly. Roughly it means exporting the identify method of PollyRequest

skimi avatar Aug 18 '20 07:08 skimi

@skimi would you mind sharing the script you wrote up to do this?

offirgolan avatar Sep 07 '20 03:09 offirgolan

@offirgolan woops, did not see your message. I've copy/paste the ugly script in this gist https://gist.github.com/skimi/79f26f1ac502ac2102119347a4aca11f. It's using NormalizeRequest to regenerate the ids. The issue being that pollyjs does not export NormalizeRequest so I just copy/pasted it... Did not update since I did it with v4 but it's still working in v5 for me

Don't judge, I just did something that worked quickly

skimi avatar Feb 16 '21 09:02 skimi

@skimi thank you very much for sharing! it worked like a charm 🎉

fsblemos avatar Jan 16 '24 21:01 fsblemos

@offirgolan can we consider to expose the NormalizeRequest so that we can use the @skimi's script without copy/pasting it? I'm willing to open a PR with this.

fsblemos avatar Jan 16 '24 21:01 fsblemos