vcrpy
vcrpy copied to clipboard
aiohttp stub records the full url in the recorded data regardless of parameter filtering
When using the filter_query_parameters argument, the aiohttp stub still records the full url given by the response object for later reconstruction of the yarl response. This is an unexpected behavior and can lead to secrets being leaked. Based on a review of the code, I think the Tornado stub may also have this same issue (but I haven't verified that). This leads us to having to manually scrub filtered values from cassettes to avoid secrets being committed into repositories.
Here is an example test case and recorded cassettes. The query parameter sometoken is intended to be dropped from storage (please ignore the fact that that the test target, httpbin, includes the query parameters in the response body).
import os
import json
import asyncio
import unittest
import vcr
import aiohttp
import requests
URL = 'https://httpbin.org/get?arg1=val1&sometoken=sekrit'
ASSETDIR = './'
FILTER_QUERY_PAREMETERS = ['sometoken']
class FilterQueryParamTest(unittest.TestCase):
# Helper code, replicated here
def _getVcrArgs(self):
kwargs = getattr(self, 'vcr_kwargs', {})
return kwargs
def getVcr(self, **kwargs):
# simplification for the vcr-unittest library.
fn = '{0}.{1}.yaml'.format(self.__class__.__name__,
self._testMethodName)
fp = os.path.join(ASSETDIR, fn)
_kwargs = self._getVcrArgs()
_kwargs.update(kwargs)
myvcr = vcr.VCR(**_kwargs)
cm = myvcr.use_cassette(fp)
return cm
def test_requests(self):
with self.getVcr(filter_query_parameters=FILTER_QUERY_PAREMETERS):
resp = requests.get(URL)
d = resp.json()
self.assertIsInstance(d, dict)
def test_aiohttp(self):
async def coro():
with self.getVcr(filter_query_parameters=FILTER_QUERY_PAREMETERS):
async with aiohttp.ClientSession() as session:
async with session.get(URL) as response:
text = await response.text()
return text
result = asyncio.run(coro())
d = json.loads(result)
self.assertIsInstance(d, dict)
This is the requests output which does not have the filtered value present in any vcrpy constructs:
interactions:
- request:
body: null
headers:
Accept:
- '*/*'
Accept-Encoding:
- gzip, deflate
Connection:
- keep-alive
User-Agent:
- python-requests/2.23.0
method: GET
uri: https://httpbin.org/get?arg1=val1
response:
body:
string: "{\n \"args\": {\n \"arg1\": \"val1\", \n \"sometoken\": \"sekrit\"\n
\ }, \n \"headers\": {\n \"Accept\": \"*/*\", \n \"Accept-Encoding\":
\"gzip, deflate\", \n \"Host\": \"httpbin.org\", \n \"User-Agent\":
\"python-requests/2.23.0\", \n \"X-Amzn-Trace-Id\": \"Root=1-5e849e8a-908aad86267b92fe372fa1da\"\n
\ }, \n \"origin\": \"8.8.8.8\", \n \"url\": \"https://httpbin.org/get?arg1=val1&sometoken=sekrit\"\n}\n"
headers:
Access-Control-Allow-Credentials:
- 'true'
Access-Control-Allow-Origin:
- '*'
Connection:
- keep-alive
Content-Length:
- '385'
Content-Type:
- application/json
Date:
- Wed, 01 Apr 2020 14:00:42 GMT
Server:
- gunicorn/19.9.0
status:
code: 200
message: OK
version: 1
This is the aiohttp response which does include the unfiltered token in the url key of the response:
interactions:
- request:
body: null
headers: {}
method: GET
uri: https://httpbin.org/get?arg1=val1
response:
body:
string: "{\n \"args\": {\n \"arg1\": \"val1\", \n \"sometoken\": \"sekrit\"\n
\ }, \n \"headers\": {\n \"Accept\": \"*/*\", \n \"Accept-Encoding\":
\"gzip, deflate\", \n \"Host\": \"httpbin.org\", \n \"User-Agent\":
\"Python/3.7 aiohttp/3.6.0\", \n \"X-Amzn-Trace-Id\": \"Root=1-5e849e89-2fdc0c40f507cd205231bb40\"\n
\ }, \n \"origin\": \"8.8.8.8\", \n \"url\": \"https://httpbin.org/get?arg1=val1&sometoken=sekrit\"\n}\n"
headers:
Access-Control-Allow-Credentials: 'true'
Access-Control-Allow-Origin: '*'
Connection: keep-alive
Content-Length: '387'
Content-Type: application/json
Date: Wed, 01 Apr 2020 14:00:41 GMT
Server: gunicorn/19.9.0
status:
code: 200
message: OK
url: https://httpbin.org/get?arg1=val1&sometoken=sekrit
version: 1
Edit:
This behavior was tested against the current master of the vcrpy project, but has also been observed in the latest 4.x release as well.
Just wanted to note here that this can be remedied "in the mean time" with something like this:
def filter_response(response):
response["url"] = '' # hide the URL
return response
..., and then making your own VCR instance, passing filter_response to the before_record_response kwarg. This doesn't seem to impact the later response playback in any negative way.
Here's a more elaborate implementation I've been using that does the actual response["url"] query param filtering:
https://github.com/scop/pytekukko/blob/c43533c8c8ff46f5fc1114e0d19a9cde58c89ae9/tests/test_pytekukko.py#L36-L50
(Nowhere near perfect: it expects replacements, does not do removal, and does not preserve more than one instance of a param, but works for my use case.)
Haven't tried just emptying the whole URL myself, nice if that works, but this one FTR in case someone finds a problem with the emptying approach.