django-clickhouse-backend icon indicating copy to clipboard operation
django-clickhouse-backend copied to clipboard

[BUG] - Package taking time to render response.

Open vigneshshettyin opened this issue 1 year ago • 7 comments

Describe the bug Observed a case where when the query string is long, Package is taking more time to render data. Also ran a profiler and checked that database is returning result instantly (less than 500ms) but django is taking more 30-35 seconds.

Query string length is 575316 characters, this is raw SQL length

With lower query string it works fine.

To Reproduce Trigger API with higher query string (approx more than 575316)

Expected behavior Django should return response within 1-2 seconds

Versions

  • ClickHouse server version. Version can be obtained by running SELECT version() query. - 23.12.1.956
  • Python version. - 3.11
  • Clickhouse-driver version. 0.26
  • Django version. - 4.2
  • Django clickhouse backend version. 1.14

image (1) image

vigneshshettyin avatar Dec 21 '23 11:12 vigneshshettyin

Is it caused by a SELECT query filtering JSON field?

jayvynl avatar Dec 22 '23 00:12 jayvynl

Is it caused by a SELECT query filtering JSON field?

No, we don't have JSON field @jayvynl

vigneshshettyin avatar Dec 22 '23 05:12 vigneshshettyin

Let us know if you need more info, we can share. @jayvynl

vigneshshettyin avatar Dec 22 '23 05:12 vigneshshettyin

@vigneshshettyin What is the query like

jayvynl avatar Dec 22 '23 07:12 jayvynl

WITH "cte" AS (
        SELECT "test_data"."CMO_ID",
               uniq("test_data"."CLP_ID") AS "no_clp",
               uniq("test_data"."PATIENT_ID") AS "no_clp"
          FROM "test_data"
         WHERE "test_data"."CODE" IN (..............)
         GROUP BY "test_data"."CMO_ID"
       ) SELECT "entity_ui_ch_entityagg"."id",
       "entity_ui_ch_entityagg"."address",
       "entity_ui_ch_entityagg"."city",
       "entity_ui_ch_entityagg"."dimid_c",
       "entity_ui_ch_entityagg"."dx_list",
       "entity_ui_ch_entityagg"."list_filter",
       "entity_ui_ch_entityagg"."full_name",
       "entity_ui_ch_entityagg"."cmo_id",
       "entity_ui_ch_entityagg"."dimid_m",
       "entity_ui_ch_entityagg"."state",
       "entity_ui_ch_entityagg"."data_tax",
       COALESCE("cte"."no_clp", 0) AS "no_clp",
       COALESCE("cte"."no_clp", 0) AS "no_clp"
  FROM "entity_ui_ch_entityagg"
  LEFT OUTER JOIN "cte"
    ON "entity_ui_ch_entityagg"."cmo_id" = ("cte"."CMO_ID")
 ORDER BY 12 DESC
 LIMIT 10

Pls, find the above query, where CODE is the dynamic filter (CODE eg are like PLO92, PLO393, HJO93 ...). When the number of codes are like 10,000 django app performs well. But when the number of codes are more than 40,000 we are facing this issue.

image (2)

@jayvynl

vigneshshettyin avatar Dec 22 '23 08:12 vigneshshettyin

As shown in your profiling, clickhouse_driver.varint.read_varint take most of the execution time. The problem is not caused by this project.

  1. Check if Django queryset generate expected SQL query, check if Django gives the correct result.
  2. Test the same query with another clickhouse driver, for example: clickhouse http endpoint/clickhouse go driver. If they are significantly faster than clickhouse_driver, then your can open an issue to clickhouse_driver.
  3. I noticed that clickhouse_driver.varint.read_varint have been called by 17098 times, it is unusual because the query result is limited to 10 rows. I have tested a query of 10 rows result, the fuction is only called by 106 times.
  4. I also noticed that clickhouse_driver.varint.read_varint function spent 4ms per call in your environment. In my local test environment it only spend 0.04ms per call. This difference may be caused by network delay. Your should also take network delay into account.

jayvynl avatar Dec 22 '23 16:12 jayvynl

Ok, let me raise a issue ticket for https://github.com/mymarilyn/clickhouse-driver

vigneshshettyin avatar Dec 23 '23 05:12 vigneshshettyin