apm-agent-python icon indicating copy to clipboard operation
apm-agent-python copied to clipboard

Repeating query in django view that failed due to OperationalError (timeout)

Open Pokapiec opened this issue 1 year ago • 8 comments

Describe the bug: When in django view some expensive query gets executed and it exceeds statement_timeout, OperationalError occurs and it makes repeatedly the same query X times (where X is probably the amount of stack traces pointing to code parts attached to QuerySet django class). Basically from what i've seen in the elasticapm codebase, elasticapm.utils.encoding.transform function executes repr internal python func on QuerySet class and under the hood, it refetches the whole query.

In our project we use forked version of elastic-apm library with small change where we add info about current tenant to each transaction in celery.

To Reproduce

  1. Write django view where you execute some ORM SELECT query
  2. Set statement_timeout session variable to some low value like 1ns to simulate OperationalError (SET LOCAL statement_timeout='1ns')
  3. Log all sql statements executed in view in django server logs (see repeating statement)

Environment (please complete the following information)

  • OS: Debian 12.6
  • Python version: 3.12.4
  • Framework and version: Django 4.2.15
  • APM Server version: 8.8.1
  • Agent version: 6.13.1

Additional context

Add any other context about the problem here.

  • Agent config options
Click to expand
ELASTIC_APM = {
    "ENABLED": USE_ELASTIC_APM,
    "SERVICE_NAME": "tenants",
    "SECRET_TOKEN": config("ELASTIC_APM_SECRET_TOKEN", cast=str),
    "SERVER_URL": config("ELASTIC_APM_SERVER_URL", cast=str),
    "ENVIRONMENT": config("ELASTIC_APM_ENVIRONMENT", cast=str, default="production"),
    "TRANSACTION_SAMPLE_RATE": config("ELASTIC_APM_TRANSACTION_SAMPLE_RATE", cast=float, default="0.01"),
    "PROCESSORS": (
        "core.elasticapm.sanitize_email",
        "core.elasticapm.sanitize_cookie",
        "core.elasticapm.add_transaction_sampling",
        "core.elasticapm.drop_unsampled_spans",
        "core.elasticapm.add_transaction_url_name",
        "core.elasticapm.add_sql_context",
    ),
    "DEBUG": config("ELASTIC_APM_DEBUG", cast=bool, default=False),
    "CAPTURE_HEADERS": False,  # No need for HTTP headers
    "LOG_LEVEL": "off",  # We caputre logs not via APM but via filebeat
    "AUTO_LOG_STACKS": False,  # Exceptions stack traces are kept in sentry
}
- `pyproject.toml`:
Click to expand
[tool.poetry.dependencies]
python = "~3.12"

aioapns = "3.2"
beautifulsoup4 = "~4.12"
bleach = "~6.1"
boto3 = "~1.34"
celery = "5.4.0"
chardet = "~5.2"
dateparser = "~1.2"
Django = "4.2.15"
djangorestframework = "~3.15"
django-cacheops = "7.0.2"
django-countries = "~7.4"
django-cors-headers = "~4.4.0"
django-filter = "~23.1"
django-fsm = "~2.8"
django-ipware = "~5.0"
django-mptt = "~0.14"
django-ordered-model = "~3.7"
django-recaptcha = "~3.0"
django-storages = "~1.13"
dnspython = "~2.6"
drf-yasg = "~1.21"
easy-thumbnails = "~2.8"
factory-boy = "~3.3"
fcm-django = "~2.1.0"
geoip2 = "~4.1.0"
google-api-python-client = "~1.12.8"
hashids = "~1.3.1"
html2text = "~2020.1.16"
IMAPClient = "~2.3"
lupa = "2.1"
messente-api = "1.5.0"
msal = "~1.25"
msgpack = "~1.0.0"
oauth2client = "~4.1.3"
phonenumberslite = "~8.12.3"
psycopg2-binary = "~2.9.5"
python-decouple = "~3.4"
python-facebook-api = "0.18.0"
python-json-logger = "2.0.2"
python-magic = "~0.4"
python-slugify = "~4.0.1"
pytz = "~2021.3"
pywebpush = "~1.11.0"
redis = "4.5.4"
sentry-sdk = "1.45.0"
ShopifyAPI = "9.0.0"
smsapi-client = "2.9.6"
tqdm = "~4.66"
twilio = "9.2.3"
ujson = "~5.8"
user-agents = "~2.2.0"
google-auth = "2.29.0"
# used by third-party apps
Brotli = "~1.0.9"
clicksend-client = "^5.0.72"
html5lib = "~1.1"
lxml = "~5.1.0"
Pillow = "~10.3"
celery-redbeat = "2.2.0"
ipdb = "0.13.4"
ipython = "~8.18"
libmagic = "~1.0"
ndg-httpsclient = "~0.5.1"
uWSGI = "2.0.26"

[tool.poetry.group.internal.dependencies]
# This is just version of elastic-apm=6.13.1 but with minor patch
elastic-apm = { git = "https://poetry:[email protected]/developers/elasticapm_tenants.git", tag = "v6.13.1-our" }

[tool.poetry.dev-dependencies]
argh = "0.31.2"
django-debug-toolbar = "*"
django-extensions = "~3.2.0"
pyroscope-io = "~0.8"
tblib = "*"
watchdog = "2.1.9"
black = "22.12.0"
coverage = "~7.0.0"
dotmap = "~1.3.8"
fakeredis = "~2.22"
flake8 = "6.0.0"
safety = "~3.2"
selenium = "~3.141.0"
sphinx = "6.2.1"
recommonmark = "0.7.1"
nbsphinx = "0.8.2"
sphinx-rtd-theme = "0.5.1"
sphinx-autobuild = "2021.3.14"

Pokapiec avatar Sep 02 '24 11:09 Pokapiec

Thanks for reporting. I think this is expected given how the Django QuerySet works, they are lazy and evaluated only when used. Said that some special casing for the case may be possible if the QuerySet has not been already evaluated and there is a way to know it is not already cached?

xrmx avatar Sep 03 '24 09:09 xrmx

If the query to database was too heavy to be evaluated there is no way it was cached. You can only cache QuerySet that is computable (it's execution time fits in statement timeout).

Pokapiec avatar Sep 03 '24 13:09 Pokapiec

If the query to database was too heavy to be evaluated there is no way it was cached. You can only cache QuerySet that is computable (it's execution time fits in statement timeout).

Exactly, so if we don't find it cached it means we should not evaluate it.

xrmx avatar Sep 03 '24 19:09 xrmx

Are you saying that there is some kind of safe-guard against evaluating non-cached QuerySets ? because i could not find such condition in library source code. This util function elasticapm.utils.encoding.transform does not consider if object is of type QuerySet and therefore it tries evaluating QuerySet (at this line https://github.com/elastic/apm-agent-python/blob/main/elasticapm/utils/encoding.py#L149) repeatedly for every stack trace regardless of the cache in it.

If it gives more context, the process of repeating SQL queries happens when got_request_exception django signal is being sent.

Pokapiec avatar Sep 04 '24 07:09 Pokapiec

Are you saying that there is some kind of safe-guard against evaluating non-cached QuerySets ?

No, I'm saying that it may be possible to add

xrmx avatar Sep 04 '24 07:09 xrmx

@xrmx what is the status of this issue ? Do you know if it will be corrected in near future ?

Pokapiec avatar Oct 22 '24 06:10 Pokapiec

@Pokapiec If you are going to open a PR I would gladly review it

xrmx avatar Oct 22 '24 07:10 xrmx

@xrmx i created a pull request https://github.com/elastic/apm-agent-python/pull/2158 could you look at it ?

Pokapiec avatar Dec 02 '24 07:12 Pokapiec