sentry-python
sentry-python copied to clipboard
'User' object has no attribute 'is_verified'
Environment
SaaS (https://sentry.io/)
Steps to Reproduce
Versions:
Django: 4.2.15uwsgi: 2.0.24sentry-sdk: 2.13.0django-otp: 1.5.1
-
Setup
django-otp: https://django-otp-official.readthedocs.io/en/stable/overview.html#installation -
Use
django_otp.admin.OTPAdminSiteas custom admin site.# urls.py from django_otp.admin import OTPAdminSite admin_site = OTPAdminSite() urlpatterns = [path("admin/, admin_site.urls),] -
Enable profiling by setting
profiles_sample_rateinsentry_sdk.init. -
When the
uwsgiserver starts, the first request to the admin fails with the error:'User' object has no attribute 'is_verified' -
If you remove the
profiles_sample_ratefromsentry_sdk.init, the error goes away.
Expected Result
Profiling should not change the behaviour of middleware so I would expect django_otp.middleware.OTPMiddleware to set the is_verified attribute on the User object successfully.
Actual Result
django_otp.middleware.OTPMiddleware somehow cannot set User.is_verified for the first request in the admin. Consequent calls works fine, but only the initial request to admin fails.
Product Area
Profiling
Link
No response
DSN
No response
Version
No response
Assigning to @getsentry/support for routing ⏲️
@omerfarukabaci, I am having trouble reproducing your issue from the information you have provided. Could you please provide your source code (if it is public), or create a minimal reproduction that you can share with me?
Hey @szokeasaurusrex, thank you for your quick answer! Unfortunately our source code is not public. I have tried to create a minimal reproduction but couldn't manage to do that, since the issue is flaky and does not happen all the time. But I have some events on Sentry, would that work? In the traces of those events, it can be seen that OTPMiddleware has been called but is_verified attribute is not set.
Hey @omerfarukabaci, any additional info would help, even just the full stack trace. Sentry events too -- feel free to put the links to them here or if you prefer, send them to me on Discord (sentrivana) or to my email (ivana.kellyer @ sentry.io). Thanks!
Hey @sentrivana, I have sent Sentry event links to your email. There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.
If you need further assistance on my side please let me know, I can override the OTPAdminSite and retrieve extra information in case of this exception. I can also provide uwsgi configuration file if that helps.
Thanks a lot for your prompt responses! 🚀
Thanks a lot for your prompt responses! 🚀
Sure thing, that's why we're here :) Thanks for the quick follow ups!
I have sent Sentry event links to your email.
So I've taken a look at the Sentry events and while the stacktraces provide some extra context, I'm still not sure what's going on, especially why profiling specifically would be causing this.
The facts we have so far:
django-otppatches theUserobject here, addingis_verifiedon it- Django tries to read
is_verifiedhere with_wrappedbeing aUserinstance and fails, indicating the middleware has not run or the user object was somehow overwritten after the fact - The error only happens on the first request
- Turning off profiling gets rid of the issue
- The profiler itself doesn't do any additional patching so it shouldn't be the case that it's overwriting the
django-otppatch - The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)
- The profiler itself doesn't do any additional patching so it shouldn't be the case that it's overwriting the
- Setting the profiler aside, our Django integration does an extra call to
User.is_authenticatedhere. I'm wondering if this has anything to do with the issue.
Some follow-ups:
- Just to cover the bases,
django-otpis properly in yourINSTALLED_APPSand more importantlyMIDDLEWARES, right? - Could you show me your
sentry_sdk.init?
There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.
I had a quick look but don't have many insights -- if you keep seeing this and it indeed looks like a profiler/SDK issue, can you please open a new issue just with that? Makes it easier for us to keep track. :)
@sentrivana Thanks for the detailed response!
Some comments about the facts:
The error only happens on the first request
Now I have realized that this is not the case, sorry for the initial misinformation. To test this I have opened up our QA and Sandbox admin pages and refreshed them tens of times, and eventually I could see the same error again, so it doesn't only happen on the first request but happens occassionally.
Turning off profiling gets rid of the issue
I have checked the deployments just before encountering this issue for the first time, and the issue is introduced after a deployment that includes:
- Upgrading
sentry-sdkfrom1.40.6to2.7.1 - Setting
profiles_sample_rate=0.3
Then the issue disappeared after I have removed profiles_sample_rate along with some other changes:
- Setting
py-call-uwsgi-fork-hooks=trueinuwsgiconf (enable-threadswas alreadytrue). - Upgrading
sentry-sdkfrom2.7.1to2.13.0
Then one week after not seeing the issue, I have enabled profiling again, then immediately we have started seeing this issue again, which is why I strongly suspect that this is related with profiling.
The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)
Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
- 9c4203a666d24a848f1980f236a48671
- 4fd9a500f7a749109e9fbaea88e42232
You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?
Follow-ups
Just to cover the bases, django-otp is properly in your INSTALLED_APPS and more importantly MIDDLEWARES, right?
Yes, copy & pasting from the settings source code:
INSTALLED_APPS = [
...,
"django_otp",
"django_otp.plugins.otp_static",
"django_otp.plugins.otp_totp",
]
MIDDLEWARE = [
m
for m in [
...
("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
]
if m
]
I thought that maybe we somehow cannot set or change the ROLE but you can see that OTPMiddleware is being called in the trace, so it shouldn't be the case.
Could you show me your sentry_sdk.init
Copy & pasting the whole sentry config from the settings source code:
import os
import re
SENTRY_DSN = os.environ.get("SENTRY_DSN")
SENTRY_ENV = os.environ.get("SENTRY_ENV")
if SENTRY_DSN:
if not SENTRY_ENV:
raise RuntimeError(
"SENTRY_ENV env var needs to be defined (use role name, such as 'admin' or 'buyer_console')"
)
import sentry_sdk
from sentry_sdk.integrations.django import DjangoIntegration
def traces_sampler(sampling_context):
path = sampling_context.get("wsgi_environ", {}).get("PATH_INFO")
method = sampling_context.get("wsgi_environ", {}).get("REQUEST_METHOD")
if path == "/favicon.ico":
return 0
if path.startswith("/static/"):
return 0
if path == "/health/":
return 0.001
if re.match(r"/v1\/api\/some-path\/\S+\/$", path) and method == "GET":
return 0.001
if path == "/v1/api/token/":
return 0.001
return 1
sentry_sdk.init(
dsn=SENTRY_DSN,
integrations=[DjangoIntegration()],
send_default_pii=True,
traces_sampler=traces_sampler,
profiles_sample_rate=1.0,
environment=SENTRY_ENV,
)
thanks @omerfarukabaci, just to rule another possibility out first, you are setting the Authentication middleware like this too before the OTP one right? https://stackoverflow.com/a/77109153
# "settings.py"
MIDDLEWARE = [
...
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django_otp.middleware.OTPMiddleware' # Here
]
Heyt @sl0thentr0py, you're welcome! Yes, it is set as following:
MIDDLEWARE = [
...
"django.contrib.auth.middleware.AuthenticationMiddleware",
("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
]
Thanks for helping us investigate, this is very much appreciated! ❤
Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
9c4203a666d24a848f1980f236a48671 4fd9a500f7a749109e9fbaea88e42232You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?
This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set send_default_pii=False for a bit and enabling the profiler. Setting send_default_pii to False should disable the code path that leads to us capturing request.user. Could you try that out and let us know if the error still happens?
Thanks for helping us investigate, this is very much appreciated! ❤
Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
9c4203a666d24a848f1980f236a48671 4fd9a500f7a749109e9fbaea88e42232You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?
This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set
send_default_pii=Falsefor a bit and enabling the profiler. Settingsend_default_piitoFalseshould disable the code path that leads to us capturingrequest.user. Could you try that out and let us know if the error still happens?
You're welcome, it is my pleasure, and also thank you for your concern! ❤️ Unfortunately I am on vacation for 3 weeks, I will try that and let you know when I am back. 🚀
Hey @sentrivana, @sl0thentr0py
I am finally back. 🙂 My colleagues changed send_default_pii to False, but unfortunately the issue still persists. Anything else I can do to investigate this further?
Hey @omerfarukabaci, welcome back 👋🏻
So much for my hypothesis 😢 So it seems like it doesn't have anything to do with the SDK accessing the user data (at least not the _set_user_info part) since that code path shouldn't be triggered at all if PII sending is False.
Based on how inconsistently it happens it indeed looks like a possible race condition. I understand that it's super fickle to repro but I don't have any ideas how to debug this further without having an example app that runs into the issue.
Hey @sentrivana, thank you! 👋🏼
Since this issue happens everyday on our QA environment, we are actually able to reproduce it. If you could direct me about what else you need to know I can update the code on the QA to gather more information. I can also fork SDK to manipulate its source code to learn more about the problem. Trying to reproduce it in a separate app is also possible, but it requires more work since we don't have much idea about the actual reason.
So the problem as far as I can tell is either that request.user is modified/overwritten after the OTP middleware has patched it, or the middleware doesn't even run (not that likely).
What we could look at, during a request when the error happens:
- Can we verify that the
django-otpmiddleware has actually run and patched the user? - Can we pinpoint when the user object has changed, or at least narrow down points in time when it looks correct vs. when it's already wrong? (E.g. check what it looks like in the profiler before and after it's taken a sample since this is our most likely culprit at the moment.)
What would also be interesting would be comparing this to a working request.
Hey @sentrivana,
What do you think about the following middleware that monkey-patches request object's getter and setter, and logs the traceback if user attribute has been changed or accessed:
import logging
import traceback
LOGGER = logging.getLogger(__name__)
class CustomRequestMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def log_traceback_for_user_action(self, action, attr):
if attr == "user":
LOGGER.info(f"[NDV-1622] request.user is being {action} by:")
for frame in traceback.extract_stack():
LOGGER.info(f"\t[NDV-1622] {frame.filename}:{frame.lineno} {frame.name}")
def __call__(self, request):
def custom_setattr(obj, attr, value):
self.log_traceback_for_user_action("set", attr)
return object.__setattr__(obj, attr, value)
def custom_getattr(obj, attr):
self.log_traceback_for_user_action("get", attr)
return object.__getattribute__(obj, attr)
request.__class__.__setattr__ = custom_setattr
request.__class__.__getattribute__ = custom_getattr
return self.get_response(request)
I am planning to add this as the first middleware, and deploy to QA and try to reproduce the issue. Then I will look for any sign of sentry-sdk. I know it is a bit tricky but I think that could work, also I am kinda out of options.
@omerfarukabaci Nice! Yeah I think that could give us a lot of insight, let's try it.
Hey @sentrivana,
Finally, I found some time to deploy the middleware above to QA. I have sent two log files for both a successful and a failed request to your email address ([email protected]). Unfortunately it still doesn't make any sense to me. 😕 I hope it somehow gives you some insight.
If you could think of anything else I can do, please let me know.
Thanks @omerfarukabaci, I've had a cursory look. Nothing jumps out at me right off the bat, can't find any substantial difference between a successful request and a failed one, but will diff this more properly in the next couple of days and report back.
Thank you @sentrivana, waiting for your response! 🚀
I took a look at the outputs and here are some more observations. I'm still not sure it's a sentry problem but it could be that just pausing every 10 ms for collecting profiler samples could be making another race condition between django and django-otp trigger more often.
- this
getattrin_set_user_infois completely missing from both file outputs so I'm like 90% sure this is not about Sentry's django integration and that user fetching code. - we need to find out when this
_verify_userdoes not get called since this is the lazy code that actually sets theis_verifiedpartial.
also after reading this code, imo this partial is_verified is kinda bad design since it is a method that is set lazily so at least they should check for its presence when calling it or not set it lazily.
Hey @sl0thentr0py, thank you for your insights! I totall agree with your points.
I was wondering that, to make sure it is (or it isn't) related with Sentry profiling, can I manually call the code that collects profiler samples? What I am planning to do is to call that function to cause the error happen all the time instead of random. I can also use a fork of the SDK to manipulate the source code, although I need a bit of help here. What do you think?
Hey @omerfarukabaci
I just read the whole thread and I would like to help.
I was wondering that, to make sure it is (or it isn't) related with Sentry profiling, can I manually call the code that collects profiler samples? What I am planning to do is to call that function to cause the error happen all the time instead of random. I can also use a fork of the SDK to manipulate the source code, although I need a bit of help here. What do you think?
There is global var _scheduler that has an object of type ThreadScheduler (or GeventScheduler depending on your setup). You can call _scheduler.sampler() which should take a sample of the stack. But this function is called on a regular basis when the profiler is enabled, so I am not sure if this helps.
Other things that came to my mind:
If the problem is that the is_verified is not there you could check:
- Is there any other middleware that messes with the user object?
- Do you have a field
is_verifiedin your User model that clashes with theis_verifiedproperty of django-otp? (maybe user is saved with is_verified=None and thus theis_verifiedpartial is deleted?)
Hello @antonpirker, sorry for my late response, things have been hectic for a while.
Do you have a field is_verified in your User model that clashes with the is_verified property of django-otp? (maybe user is saved with is_verified=None and thus the is_verified partial is deleted?)
No, neither a model field nor another attribute is_verified defined in the User model.
Is there any other middleware that messes with the user object?
Well we have some middleware that uses some global variables, and one gets request.user, but it is after the OTPMiddleware is executed. I have also tried to remove this middleware but the issue still persists.
I have also tried manually calling _scheduler.sampler() with a custom middleware before and after OTPMiddleware but no chance, couldn't reproduce the issue.
I might have something new. After investigating sentry_sdk source code for profiling I have realized the sampler (used within _scheduler.sampler()) is calling extract_stack. This function utilizes an LRUCache, and caches the frame object, which includes f_locals and f_globals dictionaries. Considering threads are using shared memory, while profiling can Sentry's thread somehow be overriding request.user of the application thread with the cached request.user which is not manipulated by OTPMiddleware yet?
I also wonder that if sentry-sdk uses threading by default, or is it only for profiling?
cc @Zylphrex can the cached frame somehow cause state race conditions in weird scenarios?
To the best of my knowledge, no. This is because we only cache the extracted attributes of a frame meaning the actual frame object itself is not stored between sampling.
One idea that might shed some light on this issue is to inject a middleware
- immediately after
django_otp.middleware.OTPMiddleware - at the end of the list of middleware
to check if is_verified exists on the user object
Perhaps something along the lines of
import logging
LOGGER = logging.getLogger(__name__)
class LogIsVerifiedMiddleware:
def __init__(self, get_response=None):
self.get_response = get_response
def __call__(self, request):
user = getattr(request, 'user', None)
if user is None:
LOGGER.info("No user found")
else:
has_is_verified = hasattr(user, "is_verified")
if has_is_verified:
LOGGER.info("User has is_verified")
else:
LOGGER.info("User does not have is_verified")
return self.get_response(request)
This will help confirm at which point the method disappears or if it was ever there.
Hey @Zylphrex, thank you for your answer! 🚀
The middleware you have provided will cause lazy request.user object to be evaluated, so I am not sure if it would work as expected or not. But I am kinda sure that it is there after all pre-response middleware hooks are called. Below is the call stack for the error:
django/utils/functional.py in inner at line 268
django_otp/admin.py in has_permission at line 69
django/contrib/admin/sites.py in each_context at line 336
django/contrib/admin/sites.py in index at line 555
django/contrib/admin/sites.py in inner at line 242
django/views/decorators/cache.py in _wrapped_view_func at line 62
django/utils/decorators.py in _wrapped_view at line 133
django/contrib/admin/sites.py in wrapper at line 261
django/core/handlers/base.py in _get_response at line 197
django/core/handlers/exception.py in inner at line 55
If you look at django/contrib/admin/sites.py:inner, you can see that before this line is executed, has_permission already called once without any errors. The error occurs when we call has_permission again in this line, which is really weird.
I will again disable profiling for a while to make sure this is really a Sentry issue.
I will again disable profiling for a while to make sure this is really a Sentry issue.
Thanks @omerfarukabaci, let us know once you have more info
Hey @szokeasaurusrex,
Sentry profiling has been disabled for the last 4 days, and the issue has gone away again. We have one other issue that appeared after enabling profiling, I didn't mention it previously because I wasn't sure that if it is related with Sentry or not, but now looking at the stack trace in detail, it is stemming from the Sentry source code for profiling, and looks like it has something to do with lazy User object:
At the end this causes an error in MySQL:
Looking at the above stack trace, I am pretty sure that Sentry profiling is evaluating the lazy User object at some point. The issue ID is 5967530499, if this is not enough just let me know, I can provide you the full link.
Sorry for not bringing up this issue previously, I didn't see (or forgot) that it is from a Sentry thread. 🙂 🚀
@omerfarukabaci The link would be helpful :)