vcrpy icon indicating copy to clipboard operation
vcrpy copied to clipboard

'before_record_request' is called twice on the same request

Open DevilXD opened this issue 4 years ago • 2 comments

Hello :wave:

It looks that the function passed to before_record_request when making a VCR instance, gets called twice with the same request. Here's some test code:

import vcr
import aiohttp
import asyncio
import requests

def filter_request(request):
    print("Filtering...")

VCR = vcr.VCR(before_record_request=filter_request)

# requests
with VCR.use_cassette("google.yaml"):
    data = requests.get("http://www.google.com")

# aiohttp
async def main():
    with VCR.use_cassette("google.yaml"):
        async with aiohttp.ClientSession() as session:
            async with session.get("http://www.google.com") as response:
                data = await response.read()

asyncio.run(main())

The example above prints Filtering... four times, twice for the requests section, and twice for the aiohttp section. While normally this wouldn't be a huge issue, I discovered this while using a regex filter to alter the URL before storing the request, and it just so happened that the pattern matches twice - once for the original URL to be filtered (intended), and a second time, for an already filtered URL (not intended), completely malforming it and causing errors down the line.

Originally discovered after getting the following traceback (and doing some debugging on my own), for a VCR instance running in the "new_episodes" recording mode (meaning that normally I should've never gotten this):

Traceback (most recent call last):
  File "<censored>", line 158, in request
    async with self._http_session.get(req_url) as response:
  File "C:\Python38\lib\site-packages\aiohttp\client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "C:\Python38\lib\site-packages\vcr\stubs\aiohttp_stubs\__init__.py", line 187, in new_request
    return play_responses(cassette, vcr_request)
  File "C:\Python38\lib\site-packages\vcr\stubs\aiohttp_stubs\__init__.py", line 89, in play_responses
    vcr_response = cassette.play_response(vcr_request)
  File "C:\Python38\lib\site-packages\vcr\cassette.py", line 265, in play_response
    raise UnhandledHTTPRequestError(
vcr.errors.UnhandledHTTPRequestError: "The cassette ('main.yaml') doesn't contain the request (<Request (GET) http://<censored_unfiltered_url>>) asked for"

DevilXD avatar Apr 29 '20 11:04 DevilXD

I figured I'd attach some minimal code that'd let you reproduce the issue:

import vcr
import requests

def filter_request(request):
    # anything that may end up changing the URL every time when called twice will work here
    request.uri = request.uri[:-5]  # trim last 5 characters
    print("Filtered URL: {}".format(request.uri))
    return request

VCR = vcr.VCR(before_record_request=filter_request)

with VCR.use_cassette("google.yaml"):
    data = requests.get("http://www.google.com")

Running it once just records and creates the google.yaml file, but running it a second time gives this error:

UnhandledHTTPRequestError: "The cassette ('google.yaml') doesn't contain the request (<Request (GET) http://www.google.com/>) asked for"

Debugging the library code shows that the two calls responsible for this are all in the cassette.py file - the can_play_response_for method calls it for the first time, and then the request in self part in it's return, delegates to the __contains__ method, which in turn delegates to the _responses method, which then calls it again: https://i.imgur.com/ReFFNhx.png

DevilXD avatar Apr 30 '20 08:04 DevilXD

Hi @DevilXD, I am facing a similar issue. However, I think for your demo script, calling before_record_request twice isn't the reason for the failure.

google.yaml does have the correct filtered URL:

    uri: http://www.google

When the yaml is loaded into Cassette's self.data, it is filtered again, and the URL becomes http://www.g.

image

In such case, _before_record_request is called twice by can_play_response_for, making request match the http://www.g in self.data, so can_play_response_for turns True. 😲

https://github.com/kevin1024/vcrpy/blob/535efe1eb92e894ccadc5515e0642b058c8c31f0/vcr/cassette.py#L252-L254

Where it really fails is at play_response:

https://github.com/kevin1024/vcrpy/blob/535efe1eb92e894ccadc5515e0642b058c8c31f0/vcr/cassette.py#L256-L268

This time, request is only filtered once by _responses and becomes http://www.google, which of course doesn't match http://www.g.

image

jiasli avatar Jul 10 '20 08:07 jiasli