vcrpy
vcrpy copied to clipboard
'before_record_request' is called twice on the same request
Hello :wave:
It looks that the function passed to before_record_request
when making a VCR instance, gets called twice with the same request. Here's some test code:
import vcr
import aiohttp
import asyncio
import requests
def filter_request(request):
print("Filtering...")
VCR = vcr.VCR(before_record_request=filter_request)
# requests
with VCR.use_cassette("google.yaml"):
data = requests.get("http://www.google.com")
# aiohttp
async def main():
with VCR.use_cassette("google.yaml"):
async with aiohttp.ClientSession() as session:
async with session.get("http://www.google.com") as response:
data = await response.read()
asyncio.run(main())
The example above prints Filtering...
four times, twice for the requests
section, and twice for the aiohttp
section. While normally this wouldn't be a huge issue, I discovered this while using a regex filter to alter the URL before storing the request, and it just so happened that the pattern matches twice - once for the original URL to be filtered (intended), and a second time, for an already filtered URL (not intended), completely malforming it and causing errors down the line.
Originally discovered after getting the following traceback (and doing some debugging on my own), for a VCR instance running in the "new_episodes"
recording mode (meaning that normally I should've never gotten this):
Traceback (most recent call last):
File "<censored>", line 158, in request
async with self._http_session.get(req_url) as response:
File "C:\Python38\lib\site-packages\aiohttp\client.py", line 1012, in __aenter__
self._resp = await self._coro
File "C:\Python38\lib\site-packages\vcr\stubs\aiohttp_stubs\__init__.py", line 187, in new_request
return play_responses(cassette, vcr_request)
File "C:\Python38\lib\site-packages\vcr\stubs\aiohttp_stubs\__init__.py", line 89, in play_responses
vcr_response = cassette.play_response(vcr_request)
File "C:\Python38\lib\site-packages\vcr\cassette.py", line 265, in play_response
raise UnhandledHTTPRequestError(
vcr.errors.UnhandledHTTPRequestError: "The cassette ('main.yaml') doesn't contain the request (<Request (GET) http://<censored_unfiltered_url>>) asked for"
I figured I'd attach some minimal code that'd let you reproduce the issue:
import vcr
import requests
def filter_request(request):
# anything that may end up changing the URL every time when called twice will work here
request.uri = request.uri[:-5] # trim last 5 characters
print("Filtered URL: {}".format(request.uri))
return request
VCR = vcr.VCR(before_record_request=filter_request)
with VCR.use_cassette("google.yaml"):
data = requests.get("http://www.google.com")
Running it once just records and creates the google.yaml
file, but running it a second time gives this error:
UnhandledHTTPRequestError: "The cassette ('google.yaml') doesn't contain the request (<Request (GET) http://www.google.com/>) asked for"
Debugging the library code shows that the two calls responsible for this are all in the cassette.py
file - the can_play_response_for
method calls it for the first time, and then the request in self
part in it's return, delegates to the __contains__
method, which in turn delegates to the _responses
method, which then calls it again: https://i.imgur.com/ReFFNhx.png
Hi @DevilXD, I am facing a similar issue. However, I think for your demo script, calling before_record_request
twice isn't the reason for the failure.
google.yaml
does have the correct filtered URL:
uri: http://www.google
When the yaml is loaded into Cassette
's self.data
, it is filtered again, and the URL becomes http://www.g
.
In such case, _before_record_request
is called twice by can_play_response_for
, making request
match the http://www.g
in self.data
, so can_play_response_for
turns True
. 😲
https://github.com/kevin1024/vcrpy/blob/535efe1eb92e894ccadc5515e0642b058c8c31f0/vcr/cassette.py#L252-L254
Where it really fails is at play_response
:
https://github.com/kevin1024/vcrpy/blob/535efe1eb92e894ccadc5515e0642b058c8c31f0/vcr/cassette.py#L256-L268
This time, request
is only filtered once by _responses
and becomes http://www.google
, which of course doesn't match http://www.g
.