Python logger integration leaks secrets from repr of aiohttp exceptions
How do you use Sentry?
Sentry Saas (sentry.io)
Version
sentry-sdk[fastapi]==1.31.0
Steps to Reproduce
We have an application logger configured to show the str of exception. The code is similar to:
async def _process(self, notify: psycopg.Notify) -> None:
# ...
try:
await endpoint.process(self, notify)
except Exception as e:
logger.error("error processing notify %s: %s", notify, e)
where endpoint.process() calls something that contains:
async with meth(url, headers=headers, **kwargs) as resp:
resp.raise_for_status()
A stand-alone script, but without sentry integration, to raise the same exception is the following:
import asyncio
import aiohttp
import logging
logger = logging.getLogger()
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')
async def main():
async with aiohttp.ClientSession() as s:
resp = await s.get(
"https://example.com/api/v2/session/",
headers={"Authentication": "SECRET"},
)
try:
resp.raise_for_status()
except Exception as e:
logger.error("an error: %s", e)
if __name__ == "__main__":
asyncio.run(main())
Expected Result
Upon misconfigured authentication to an internal api, the program logged something like this in the server logs:
ERROR: error processing notify Notify(channel='payflow_event', payload='-1', pid=37): 401, message='Unauthorized', url=URL('http://auth.docker/api/v2/store/payflow/notify/')
or, for the example script:
2023-10-05 19:33:40,614 ERROR an error: 404, message='Not Found', url=URL('https://example.com/api/v2/session/')
with the str() of the error.
Actual Result
What sentry does is instead pass repr() information to the server, which leaks private information from the headers. In sentry web interface I see the error represented as:
Which is consistent with a repr that, in the test script, looks like:
2023-10-05 19:44:11,102 ERROR an error:
ClientResponseError(RequestInfo(url=URL('https://example.com/api/v2/session/'),
method='GET', headers=<CIMultiDictProxy('Host': 'example.com', 'Authentication': 'SECRET',
'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'User-Agent': 'Python/3.10
aiohttp/3.8.3')>, real_url=URL('https://example.com/api/v2/session/')), (), status=404,
message='Not Found', headers=<CIMultiDictProxy('Content-Encoding': 'gzip', 'Accept-Ranges':
'bytes', 'Age': '690', 'Cache-Control': 'max-age=604800', 'Content-Type': 'text/html;
charset=UTF-8', 'Date': 'Thu, 05 Oct 2023 17:44:11 GMT', 'Expires': 'Thu, 12 Oct 2023 17:44:11
GMT', 'Last-Modified': 'Thu, 05 Oct 2023 17:32:41 GMT', 'Server': 'ECS (dcb/7EC6)', 'Vary':
'Accept-Encoding', 'X-Cache': '404-HIT', 'Content-Length': '648')>)
...leaking SECRET in the header.
The logger asked for %s: there are reason why an user wants to see str(), not repr().
Can this behaviour be disabled? Shouldn't it be the default?
(I'm also baffled at how a psycopg.Notify, which is a typing.NamedTuple, got converted to a list-like representation...)
Hey @dvarrazzo ! Thanks for reporting this! I will have a look today!
Ok, I had a look.
tl;dr: The logging instrumentation serializes the parameters of the logging.error("Wrong pwd %s, exception: %s", "secret123", ex) command with our global serializer uses repr() to serialize objects. The repr() of an object (like ex) can have sensitive information like auth tokens in it.
Changing the serializer to str() is not an option because it would change the overall behavior of the SDK and would break everything.
The options I identified:
-
Add log values only when
send_default_pii=True(Falseby default). This will change the behavior of everyone using the logging integration. (so: everyone, because it is a default integration.) For most people the log parameters would disappear. People need to turn on PII sending globally for getting there log values back. -
Add a parameter to the logging integration called
add_log_parameters(default:False) This will change the behavior of everyone using the logging integration. (so: everyone, because it is a default integration.) For most people the log parameters would disappear. People can setadd_log_parameters=Trueto the logging integrations to get their values back. -
Using
str()explicitly in the logging integration (so therepr()from the serializer is not used) Solves this GH issue immediately. This will change the behavior for everyone using the logging integration, but if they just log values wherestr()andrepr()return the same) they will not notice There is no way people can opt-in/opt-out of the new behavior. -
Do not change the integration. Using features described in https://docs.sentry.io/platforms/python/data-management/sensitive-data/ to fix the issue. This will solve this issue, but does not make the Sentry SDK "safe by default".
The optimal solution is probably something in between. @sentrivana, @sl0thentr0py @szokeasaurusrex what are your thoughts?
There is also scrubbing of sensitive data happening on the server (on relay which is the tool we have to ingest events sent to Sentry.io). They have an open issue that is related: https://github.com/getsentry/relay/issues/2567 I commented there.
So @dvarrazzo the solution is that our server will improve scrubbing of sensitive data (see linked issue above) and you can add an EventScrubber or a before_send hook to remove the sensitive data.
@antonpirker thank you for looking into this.
I am fairly new with Sentry, and I still don't get entirely how scrubbing works. The way I have found this installation looks already relatively aggressive. In particular, looking at the same issue already reported here, there are also entries such as:
As a newcomer to Sentry, I don't know why these are scrubbed; however, they don't contain any sensitive info, and their censoring makes this report page less useful than it should be.
As far as I'm concerned, my approach to solve the problem in this project will be to wrap every exception passed to a logger into a str() call. This has a (negligible) performance impact, and a (non-negligible) cost for my client.
For what it's worth, I believe that Sentry is wildly overstepping the trust given by replacing the str() info requested on an object with repr() info, and then is trying to limit the damage adding a layer of magic on top of a poor choice.
Another solution would be to not use the logging integration and after your error log statement reraise the error and let our FastAPI integration capture it.
(The logging integration is one of the default integrations and can disabled with a flag to init(): https://docs.sentry.io/platforms/python/configuration/options/#default-integrations (all default integrations will be disabled, but you can add ones that you want to keep to the integrations parameter to init().)
The FastAPI integration will be auto enabled when you have the fastapi package in your project.
Moving this from "Bug" to "Improvement" because the system appears to be working as intended (though could perhaps be improved). I'll let others comment on whether we should "close as not planned", or whether this is going to kick off work someday on a specific change/improvement.
As the improvements to data scrubbing in relay have been released some time ago and we always will improve the data scrubbing, i will close this issue.
If you have any cases of sensitive data showing up in Sentry where you think it should not be showing up, please open a new issue.