vcrpy before_record_response returns inconsistent data depending on recording or not

before_record_response returns inconsistent data depending on recording or not

Open Diaoul opened this issue 3 years ago • 12 comments

It's all in the title and seems to be related to #356 My tests depend on some sensitive data to be at the value I replaced it with. So when I run the tests I have sometimes the real data (which I don't want) and sometimes the fake data actually in the cassette.

This is an inconsistent behaviour and, depending on what the data is used for afterwards, could lead to sensitive information being leaked when recording cassettes. For example, in the test logs, integration servers, etc.

Jul 11 '20 09:07 Diaoul

I think the use case of having subsequent requests depending on the real data like a login behaviour as described in #355 is valid although I think this should not be solved this way. Maybe by making it an optional behaviour?

Jul 11 '20 09:07 Diaoul

Maybe an additional before_handle_response option could provide a callback for modifying the response? The modification would apply both when returning data for a real response and when saving the response into a cassette.

I drafted a patch for this in #594 – it invokes the callback(s) here in vcr.stubs.VCRConnection.getresponse():

            # get the response
            response = self.real_connection.getresponse()

            # put the response into the cassette
            response = {
                "status": {"code": response.status, "message": response.reason},
                "headers": serialize_headers(response),
                "body": {"string": response.read()},
            }
            response = copy.deepcopy(response)                          ## callback 
            response = self.cassette._before_handle_response(response)  ## invoked here

            self.cassette.append(self._vcr_request, response)
        return VCRHTTPResponse(response)

My use case for this feature is a test suite for an API client which gets very large JSON responses from an actual API server. To reduce the amount of data in the test suite, I'd like to trim down the data but still make it simple to recreate the cassettes without having to manually tweak them.

I did try to use requests-mock and VCRpy together, but it seems requests-mock's patches disable VCRpy and no cassettes get written.

Jul 05 '21 11:07 akaihola

Maybe by making it an optional behaviour?

@Diaoul, I added an alternative solution in #595 which does exactly that: you still define callbacks for modifying responses using the before_record_response option, but you can also specify alter_live_response = True to see those modifications in live responses when cassettes are being recorded.

Jul 05 '21 13:07 akaihola

@kevin1024 do you think this enhancement is valid? The idea is to make sure the responses seen by the user's code are identical between recording and playback modes.

Your thoughts on the two different approaches in #594 and #595?

I think the pros and cons of #594 are:

PRO: user just needs to change from before_record_response to before_handle_response with no need for additional options
CON: code duplication
NOTE: I'm not sure whether it was appropriate to handle decode_compressed_response here, too

As for #595:

PRO: no code duplication
CON: two options to set (before_record_response and alter_live_response)
CON: adds confusion – I think this should be the default behavior actually
NOTE: private attribute access from stubs to Cassette._alter_live_response

If you accept the feature and one of these approaches, I'll proceed to writing unit tests, documentation and a change log entry.

Jul 05 '21 15:07 akaihola

Hmm, how does Ruby VCR handle this case?

And is there a way to make this change backwards-compatible?

I have to admit I don’t have much time to maintain VCR these days so I appreciate your time!

Jul 05 '21 15:07 kevin1024

Ruby VCR seems to have an after-http-request hook. Based on the documentation it's unclear whether modifying the response is allowed.

In the Stack Overflow question VCR ignored parameter: Get real http request, user 23tux says:

I tried it out with c.after_http_request(:stubbed?) where I get the response body AND the real value of the callback parameter, but this hook is too late, because my app already received the response body.

So maybe after-http-request hook of Ruby VCR doesn't actually do what I'm looking for either.

Both #594 and #595 should be backwards-compatible.

Are there other active contributors to VCRpy who I could ping and ask opinions from?

Also, I'm wondering what actually is the use case for modifying the recorded response but still passing through the original one when recording?

Jul 05 '21 17:07 akaihola

Also, I'm wondering what actually is the use case for modifying the recorded response but still passing through the original one when recording?

Ah! That one I can answer. It's usually to keep credentials/secrets from ending up in the repo.

Jul 06 '21 02:07 kevin1024

RE #594 vs #595: You understand this problem more deeply than I do, so I trust your judgement and I'll happily merge your PR as long as it has tests and doesn't change anything (test suites or recorded cassettes) in a backwards-incompatibile way :)

Edit: Oh - and documentation as well!

Jul 06 '21 02:07 kevin1024

To summarize the difference between #594 and #595 from the user's perspective if she wishes her code to receive modified responses also when they are being recorded:

#594

def vcr_config():
    return {
        "before_handle_response": [my_callback],
    }

#595

def vcr_config():
    return {
        "before_record_response": [my_callback],
        "alter_live_response": True,
    }

Jul 07 '21 06:07 akaihola

Hello, another happy user of vcrpy here.

From API perspective I like before_handle_response better, because alter_live_response changes meaning of already passed hook, which indeed might be confusing.

I am sure the duplication can be factored out or even left as-is.

Jul 07 '21 08:07 samoylovfp

I like before_handle_response better, because alter_live_response changes meaning of already passed hook

@samoylovfp, that makes sense. I closed #595 and will proceed with #594.

I wonder about the naming of the new hook before_handle_response. It allows modifying the response before it is recorded or returned to user code. Does the word handle properly communicate this? Or should we follow Ruby VCR's example and name it e.g. after_receive_response?

I also wonder whether there are any use cases where it's essential that the original unmodified response is seen by user code when recording. At least from a unit testing perspective, I'd say it's more important to actually get the identical response in both recording and playback situations.

If there are no use cases for before_record_response, maybe it could even be deprecated (with a long deprecation period) or its use discouraged in documentation?

Jul 07 '21 08:07 akaihola

Reminder to self: I need to continue working on this pull request.

Sep 21 '21 06:09 akaihola

vcrpy vcrpy copied to clipboard

before_record_response returns inconsistent data depending on recording or not

#594

#595

vcrpy
vcrpy copied to clipboard