self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Timeouts in large clickhouse eap_items_1_local

Open frederikspang opened this issue 3 months ago • 27 comments

Self-Hosted Version

25.9.0

CPU Architecture

x86_64

Docker Version

28.4.0

Docker Compose Version

2.39.2

Machine Specification

  • [x] My system meets the minimum system requirements of Sentry

Steps to Reproduce

  • Run 25.9.0
  • Large ingest over multiple days
  • Open "new" insights backend page.

Expected Result

/event-stats api calls return reasonably fast results.

Actual Result

Clickhouse error log:

2025.09.20 17:54:32.714831 [ 859 ] {randomized-2507e81dae5f4761806b3bfb335617f9} <Error> executeQuery: Code: 159. DB::Exception: Timeout exceeded: elapsed 25005.287 ms, maximum: 25000 ms. (TIMEOUT_EXCEEDED) (version 25.3.6.10034.altinitystable (altinity build)) (from 172.18.0.23:49096) (in query: SELECT (if(mapContains(attributes_string_19, 'sentry.op'), arrayElement(attributes_string_19, 'sentry.op'), NULL) AS `sentry.op_TYPE_STRING`), (if(mapContains(attributes_string_39, 'sentry.group'), arrayElement(attributes_string_39, 'sentry.group'), NULL) AS `sentry.group_TYPE_STRING`), (cast(project_id, 'Int64') AS `sentry.project_id_TYPE_INT`), (if(mapContains(attributes_string_36, 'sentry.normalized_description'), arrayElement(attributes_string_36, 'sentry.normalized_description'), NULL) AS `sentry.normalized_description_TYPE_STRING`), (divide(sumIfOrNull(multiply((if(mapContains(attributes_float_5, 'sentry.duration_ms'), arrayElement(attributes_float_5, 'sentry.duration_ms'), NULL) AS `sentry.duration_ms_TYPE_DOUBLE`), divide(1, sampling_factor)), mapContains(attributes_float_5, 'sentry.duration_ms') AND true), sumIfOrNull(divide(1, sampling_factor), mapContains(attributes_float_5, 'sentry.duration_ms') AND true)) AS `avg(span.duration)`), (divide(abs(minus(divide(plus((sumIfOrNull(multiply(`sentry.duration_ms_TYPE_DOUBLE`, sampling_weight), mapContains(attributes_float_5, 'sentry.duration_ms') AND true) AS `__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__sum`), (multiply(2.24, sqrt(sumIf(multiply(multiply(`sentry.duration_ms_TYPE_DOUBLE`, `sentry.duration_ms_TYPE_DOUBLE`), minus(multiply(sampling_weight, sampling_weight), sampling_weight)), mapContains(attributes_float_5, 'sentry.duration_ms') AND true))) AS `__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__sum_err`)), minus((sumIfOrNull(sampling_weight, mapContains(attributes_float_5, 'sentry.duration_ms') AND true) AS `__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__count`), (multiply(2.24, sqrt(sumIf(minus(multiply(sampling_weight, sampling_weight), sampling_weight), mapContains(attributes_float_5, 'sentry.duration_ms') AND true))) AS `__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__count_err`))), divide(minus(`__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__sum`, `__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__sum_err`), plus(`__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__count`, `__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg__count_err`)))), 2) AS `__snuba_custom_column__confidence_interval$avg(span.duration)$function_type:avg`), (divide(countIf(`sentry.duration_ms_TYPE_DOUBLE`, mapContains(attributes_float_5, 'sentry.duration_ms') AND true), sumIf(sampling_weight, mapContains(attributes_float_5, 'sentry.duration_ms') AND true)) AS `__snuba_custom_column__average_sample_rate$avg(span.duration)`), (countIf(`sentry.duration_ms_TYPE_DOUBLE`, mapContains(attributes_float_5, 'sentry.duration_ms') AND true) AS `__snuba_custom_column__count$avg(span.duration)`), (coalesce(if(mapContains(attributes_string_20, 'sentry.transaction'), arrayElement(attributes_string_20, 'sentry.transaction'), NULL), if(mapContains(attributes_string_3, 'transaction'), arrayElement(attributes_string_3, 'transaction'), NULL)) AS `sentry.transaction_TYPE_STRING`) FROM eap_items_1_local WHERE in(project_id, [2, 3, 5, 7, 8, 9, 10, 12, 13, 14, 17]) AND equals(organization_id, 1) AND less(timestamp, toDateTime(1758391201)) AND greaterOrEquals(timestamp, toDateTime(1755798301)) AND mapContains(attributes_string_6, 'db.system') AND (notEquals((if(mapContains(attributes_string_6, 'db.system'), arrayElement(attributes_string_6, 'db.system'), NULL) AS `db.system_TYPE_STRING`), '') OR xor(isNull(`db.system_TYPE_STRING`), isNull(''))) AND mapContains(attributes_string_39, 'sentry.group') AND (notEquals(`sentry.group_TYPE_STRING`, '') OR xor(isNull(`sentry.group_TYPE_STRING`), isNull(''))) AND mapContains(attributes_float_5, 'sentry.duration_ms') AND equals(item_type, 1) GROUP BY `sentry.op_TYPE_STRING`, `sentry.group_TYPE_STRING`, `sentry.project_id_TYPE_INT`, `sentry.normalized_description_TYPE_STRING`, `sentry.transaction_TYPE_STRING` ORDER BY `avg(span.duration)` DESC LIMIT 4 OFFSET 0), Stack trace (when copying this message, always include the lines below):

Web log:

web-1  | 17:53:07 [ERROR] django.request: Gateway Timeout: /api/0/organizations/last/events-stats/ (status_code=504 request=<WSGIRequest: GET '/api/0/organizations/last/events-stats/?dataset=spans&excludeOther=0&interval=1d&partial=1&per_page=50&query=span.op%3Aqueue.process&referrer=api.insights.backend.overview.jobs-chart&sampling=NORMAL&statsPeriod=30d&transformAliasToInputFormat=0&yAxis=trace_status_rate%28internal_error%29&yAxis=count%28span.duration%29'>)

Running the query by hand in clickhouse-client within the container yields:

4 rows in set. Elapsed: 18.297 sec. Processed 136.07 million rows, 64.32 GB (7.44 million rows/s., 3.52 GB/s.) Peak memory usage: 284.89 MiB.

Not the fastest disks in the world, but still a lot of processing.

Count of table:

SELECT count(*)
FROM eap_items_1_local

Query id: e1d48846-cbbb-4492-9ba7-c5dadfeca1db

   ┌───count()─┐
1. │ 176510259 │ -- 176.51 million
   └───────────┘

Event ID

No response

frederikspang avatar Sep 20 '25 18:09 frederikspang

I am in the same position as you. Insights backend is also unresponsive for me. I just have a "few" more entries in clickhouse as it seems 😅

SELECT count(*)
FROM eap_items_1_local

Query id: 4dc1eae8-969c-454b-885c-fd886d912b0e

   ┌────count()─┐
1. │ 1369086710 │ -- 1.37 billion
   └────────────┘

I upgraded from 25.8.0 to 25.9.0

LordSimal avatar Sep 20 '25 18:09 LordSimal

Any news on this? I've had this issue since 24.8.0, I was hoping it would be fixed in this release

zKoz210 avatar Sep 22 '25 15:09 zKoz210

Any news on this? I've had this issue since 24.8.0, I was hoping it would be fixed in this release

24.8.0 a typo? Or have you run EAP with some other early access feature-flags?

I did for 25.8.0, and waited with raising the issue until GA of these features, like ourlogs and new stats UI/Taskbroker/other stuff I believed to be related.

frederikspang avatar Sep 22 '25 15:09 frederikspang

Oh, yes, mistake, 25.8.0

zKoz210 avatar Sep 22 '25 16:09 zKoz210

I am assuming we are not doing a contest here but:

   ┌────count()─┐
1. │ 7507373159 │ -- 7.51 billion
   └────────────┘

And indeed unless I manage to set the time interval to 24 hours the insights pages (in my case Laravel Back-end) just crash the server (16c/64GB and 64GB swap on NVMe storage). Good thing is that it defaults to 14 days so everytime I need to open the page, switch to 24h and wait 15-30 minutes for the server to calm down.

Curious if there is something to fix here or that it's because we run Clickhouse on a single node and it cannot distribute the load or something and we are just running into limitations of the current setup sentry.io has no problem with because they probably have quite a few nodes in their Clickhouse cluster.

This did not seem to be new this release though and already happening in 25.8.0 but I might be wrong since I'm using nightly builds most of the time, so there might be a code change related to this or a missed migration missing an index or whetever.

stayallive avatar Sep 23 '25 07:09 stayallive

I adjusted my sentry.conf.py to the following nodestore config:

SENTRY_NODESTORE = "sentry_nodestore_s3.S3PassthroughDjangoNodeStorage"
SENTRY_NODESTORE_OPTIONS = {
    "compression": True,
    "endpoint_url": "http://seaweedfs:8333",
    "bucket_path": "nodestore",
    "bucket_name": "nodestore",
    "region_name": "us-east-1",
    "aws_access_key_id": "sentry",
    "aws_secret_access_key": "sentry",
    "delete_through": True,     # delete through to the Django nodestore (delete object from S3 and Django)
    "write_through": False,     # write through to the Django nodestore (duplicate writes to S3 and Django)
    "read_through": True,       # read through to the Django nodestore (if object not found in S3)
}

and it made at least the issue detail page work again.

The insights area is still broken

LordSimal avatar Sep 23 '25 08:09 LordSimal

Doubling down on @stayallive I went from 25.8.0 to Nightly to test pgbouncer setup - and actually back to 25.8.0. MAY have been a missed migration; Because I saw these in 25.8.0 as well, with some EAP feature flags.

frederikspang avatar Sep 23 '25 09:09 frederikspang

Yep, insights is pretty broken for us as well (couple of billion rows in that table). Anyone aware of a setting with which we could make insights default to 24 hours instead of 14 days? I'm assuming that this must be a problem with the table definition/migrations, since this apparently working in Sentry cloud on similar volumes, we run this with 64GB of RAM, most of which is un-utilized.

gersmann avatar Sep 23 '25 11:09 gersmann

FYI - have been able to mitigate this (I think) by disabling the use-eap flag for insights: https://github.com/getsentry/self-hosted/commit/8ad99169994e6c67f50488e27646d30c9ca80703#diff-f4bf6cddc34b3c80a2bb2497a46d5e7315424593993d0ce41ab7c48064ba165cR327

gersmann avatar Sep 25 '25 15:09 gersmann

@aldy505 @phacops Hi! Could you please let me know when someone more experienced than us can look at this issue? We can't use Sentry because nothing is loading.

zKoz210 avatar Sep 26 '25 14:09 zKoz210

Following debug-session in Discord (#self-hosted), "some" queries that overspend resources are (from query log):

SELECT query
FROM clusterAllReplicas(default, system.query_log)
WHERE initial_query_id = 'randomized-d5e5b196fece40c39504078ab11a4a96'

Query id: fbce3fb5-8525-4838-ba45-20abe23fbd73

   ┌─query──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ SELECT (cast(lower(leftPad(hex(item_id), if(greater(length(hex(item_id)), 16), 32, 16), '0')), 'String') AS `sentry.item_id_TYPE_STRING`), (if(mapContains(attributes_string_19, 'sentry.op'), arrayElement(attributes_string_19, 'sentry.op'), NULL) AS `sentry.op_TYPE_STRING`), (if(mapContains(attributes_string_11, 'sentry.raw_description'), arrayElement(attributes_string_11, 'sentry.raw_description'), NULL) AS `sentry.raw_description_TYPE_STRING`), (if(mapContains(attributes_float_5, 'sentry.duration_ms'), arrayElement(attributes_float_5, 'sentry.duration_ms'), NULL) AS `sentry.duration_ms_TYPE_DOUBLE`), (coalesce(if(mapContains(attributes_string_20, 'sentry.transaction'), arrayElement(attributes_string_20, 'sentry.transaction'), NULL), if(mapContains(attributes_string_3, 'transaction'), arrayElement(attributes_string_3, 'transaction'), NULL)) AS `sentry.transaction_TYPE_STRING`), (cast(timestamp, 'Float64') AS `sentry.timestamp_TYPE_DOUBLE`), (transform(CAST(ifNull((cast(project_id, 'Int64') AS `sentry.project_id_TYPE_INT`), ''), 'String'), ['13', '17', '10', '9', '7', '5', '2', '14', '8', '3', '12'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'], 'unknown') AS project), (cast(replaceAll(toString(trace_id), '-', ''), 'String') AS `sentry.trace_id_TYPE_STRING`), (if(mapContains(attributes_string_11, 'sentry.segment_id'), arrayElement(attributes_string_11, 'sentry.segment_id'), NULL) AS `sentry.segment_id_TYPE_STRING`), (transform(CAST(ifNull(`sentry.project_id_TYPE_INT`, ''), 'String'), ['13', '17', '10', '9', '7', '5', '2', '14', '8', '3', '12'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'], 'unknown') AS `project.name`) FROM eap_items_1_local WHERE in(project_id, [2, 3, 5, 7, 8, 9, 10, 12, 13, 14, 17]) AND equals(organization_id, 1) AND less(timestamp, toDateTime(1759217701)) AND greaterOrEquals(timestamp, toDateTime(1758008041)) AND true AND equals(item_type, 1) ORDER BY `sentry.timestamp_TYPE_DOUBLE` DESC LIMIT 51 OFFSET 0 │
2. │ SELECT (cast(lower(leftPad(hex(item_id), if(greater(length(hex(item_id)), 16), 32, 16), '0')), 'String') AS `sentry.item_id_TYPE_STRING`), (if(mapContains(attributes_string_19, 'sentry.op'), arrayElement(attributes_string_19, 'sentry.op'), NULL) AS `sentry.op_TYPE_STRING`), (if(mapContains(attributes_string_11, 'sentry.raw_description'), arrayElement(attributes_string_11, 'sentry.raw_description'), NULL) AS `sentry.raw_description_TYPE_STRING`), (if(mapContains(attributes_float_5, 'sentry.duration_ms'), arrayElement(attributes_float_5, 'sentry.duration_ms'), NULL) AS `sentry.duration_ms_TYPE_DOUBLE`), (coalesce(if(mapContains(attributes_string_20, 'sentry.transaction'), arrayElement(attributes_string_20, 'sentry.transaction'), NULL), if(mapContains(attributes_string_3, 'transaction'), arrayElement(attributes_string_3, 'transaction'), NULL)) AS `sentry.transaction_TYPE_STRING`), (cast(timestamp, 'Float64') AS `sentry.timestamp_TYPE_DOUBLE`), (transform(CAST(ifNull((cast(project_id, 'Int64') AS `sentry.project_id_TYPE_INT`), ''), 'String'), ['13', '17', '10', '9', '7', '5', '2', '14', '8', '3', '12'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'], 'unknown') AS project), (cast(replaceAll(toString(trace_id), '-', ''), 'String') AS `sentry.trace_id_TYPE_STRING`), (if(mapContains(attributes_string_11, 'sentry.segment_id'), arrayElement(attributes_string_11, 'sentry.segment_id'), NULL) AS `sentry.segment_id_TYPE_STRING`), (transform(CAST(ifNull(`sentry.project_id_TYPE_INT`, ''), 'String'), ['13', '17', '10', '9', '7', '5', '2', '14', '8', '3', '12'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'], 'unknown') AS `project.name`) FROM eap_items_1_local WHERE in(project_id, [2, 3, 5, 7, 8, 9, 10, 12, 13, 14, 17]) AND equals(organization_id, 1) AND less(timestamp, toDateTime(1759217701)) AND greaterOrEquals(timestamp, toDateTime(1758008041)) AND true AND equals(item_type, 1) ORDER BY `sentry.timestamp_TYPE_DOUBLE` DESC LIMIT 51 OFFSET 0 │

Using config.xml:

    <query_log>
      <database>system</database>
      <table>query_log</table>
      <ttl>event_date + INTERVAL 1 day</ttl>
      <flush_interval_milliseconds>7500</flush_interval_milliseconds>
      <max_size_rows>1048576</max_size_rows>
      <reserved_size_rows>8192</reserved_size_rows>
      <buffer_size_rows_flush_threshold>524288</buffer_size_rows_flush_threshold>
      <flush_on_crash>false</flush_on_crash>
    </query_log>

frederikspang avatar Sep 30 '25 07:09 frederikspang

To debug clickhouse queries one has to adjust their clickhouse/config.xml and replace

<query_log remove="remove"/>

with

<query_log>
  <database>system</database>
  <table>query_log</table>
  <ttl>event_date + INTERVAL 1 day</ttl>
  <flush_interval_milliseconds>7500</flush_interval_milliseconds>
  <max_size_rows>1048576</max_size_rows>
  <reserved_size_rows>8192</reserved_size_rows>
  <buffer_size_rows_flush_threshold>524288</buffer_size_rows_flush_threshold>
  <flush_on_crash>false</flush_on_crash>
</query_log>

then restart your sentry as whole via docker compose down && docker compose up -d --wait and wait a few minutes.


After that you need to open a clickhouse cli via

docker compose exec clickhouse clickhouse-client

and execute the following query:

SELECT
    type,
    event_time,
    initial_query_id,
    formatReadableSize(memory_usage) AS memory,
    `ProfileEvents.Values`[indexOf(`ProfileEvents.Names`, 'UserTimeMicroseconds')] AS userCPU,
    `ProfileEvents.Values`[indexOf(`ProfileEvents.Names`, 'SystemTimeMicroseconds')] AS systemCPU,
    normalizedQueryHash(query) AS normalized_query_hash
FROM system.query_log
ORDER BY memory_usage DESC
LIMIT 10

this will result in something like

┌─type─────────────────────┬──────────event_time─┬─initial_query_id────────────────────────────┬─memory─────┬──userCPU─┬─systemCPU─┬─normalized_query_hash─┐
 1. │ ExceptionWhileProcessing │ 2025-09-30 07:22:00 │ randomized-ab42c85f7f6b44faa114e43bb201d0cc │ 1.20 GiB   │ 36715919 │   4302509 │  13521358877695970890 │
 2. │ QueryFinish              │ 2025-09-30 07:21:30 │ randomized-1488b44299044b69ba484f029b58bdfa │ 1.04 GiB   │ 26935740 │   3747378 │  13521358877695970890 │
 3. │ QueryFinish              │ 2025-09-30 07:17:41 │ 3aafa240-b881-45d1-b766-22e1ae771af5        │ 944.99 MiB │   359105 │    171500 │   7330655308746279451 │
 4. │ QueryFinish              │ 2025-09-30 07:19:39 │ cb070472-3fcb-4db7-bbd1-2b770e96d056        │ 929.71 MiB │   258367 │     93965 │   7330655308746279451 │
 5. │ QueryFinish              │ 2025-09-30 07:19:50 │ 2029f173-a5ba-462a-ae9b-080367cf1fb8        │ 913.64 MiB │   217354 │    109247 │   7330655308746279451 │
 6. │ QueryFinish              │ 2025-09-30 07:22:01 │ a2558303-bf92-4e99-aa98-0d67b93ee9e2        │ 913.15 MiB │   379315 │    113498 │   7330655308746279451 │
 7. │ QueryFinish              │ 2025-09-30 07:20:04 │ 51b71240-52f2-4f39-9168-548dd6f74e3e        │ 910.61 MiB │   238052 │     57025 │   7330655308746279451 │
 8. │ QueryFinish              │ 2025-09-30 07:22:13 │ randomized-250237e9df7049dcb8f415d9c76db28b │ 399.61 MiB │ 80140405 │   2365666 │    467395772582250954 │
 9. │ QueryFinish              │ 2025-09-30 07:22:08 │ 97d015f5-b673-4714-a813-91073f70f83a        │ 271.14 MiB │    17948 │     23297 │   7949310113346595683 │
10. │ QueryFinish              │ 2025-09-30 07:20:04 │ 69e89ea2-fbd5-412f-bf8e-000cfe4eacc4        │ 270.22 MiB │    10641 │     11810 │   7949310113346595683 │
    └──────────────────────────┴─────────────────────┴─────────────────────────────────────────────┴────────────┴──────────┴───────────┴───────────────────────┘

which doesn't return the raw SQL query, because it needs to be fetched via

SELECT query
FROM clusterAllReplicas(default, system.query_log)
WHERE initial_query_id = '<id>'

For me, I got message.txt

LordSimal avatar Sep 30 '25 07:09 LordSimal

I did another test and found another sql query, which takes up > 1GB of RAM:

 6. │ ExceptionWhileProcessing │ 2025-09-30 10:32:09 │ randomized-781026ea3ec64a0b941a16f5dbe5fa53 │ 1.23 GiB   │ 166710882 │   4161280 │   6013952681561621165 │

query.txt

LordSimal avatar Sep 30 '25 10:09 LordSimal

For those who don't join Sentry's Discord: Pierre is looking at this

aldy505 avatar Oct 04 '25 01:10 aldy505

Hi, sorry for being annoying, but is there any insight into how to solve this problem and will it be released before 25.10.0 release?

zKoz210 avatar Oct 07 '25 13:10 zKoz210

For me I can make the Insights Backend "work" again if I filter the view by a specific project instead of showing the data of all my projects.

Something about the query, which contains > 200 project id's and related data seems to break clickhouse 😁

Still not a "true solution" of course but a workaround to use the backend again.

LordSimal avatar Oct 07 '25 13:10 LordSimal

No, I don't think this will be fixed when 25.11.0 released.

aldy505 avatar Nov 12 '25 13:11 aldy505

+1 with this issue

bazylys avatar Nov 17 '25 17:11 bazylys

I have the same problem with Sentry 25.10.

windreid avatar Nov 18 '25 17:11 windreid

Same problem on 25.9.0. Does 25.8.0 works well?

langovoi avatar Nov 21 '25 16:11 langovoi

Same problem on 25.9.0. Does 25.8.0 works well?

I believe depending on your spans/transaction data, it might work if you have little amount. Most people that impacted with this problem has more than 5M spans per day, the current workaround is just to choose lower time interval.

The fix should be implemented on 25.12.0 release.

aldy505 avatar Nov 22 '25 00:11 aldy505

Found a fix for this. The issue is that Snuba has downsampled tables (eap_items_1_downsample_8_local, etc.) but the default threshold to use them is set to 1 billion rows, which is too high for most self-hosted instances.

You can lower this threshold via Snuba's runtime config in Redis:

docker compose exec redis redis-cli -n 1 HSET snuba-config "OutcomesBasedRoutingStrategy.max_items_before_downsampling" "100000000"

This sets the threshold to 100 million. After this change, queries started hitting the 8x downsampled table instead of the full one, and query times improved significantly.

To verify Snuba sees the config:

docker compose exec snuba-api python3 -c "from snuba import state; print(state.get_int_config('OutcomesBasedRoutingStrategy.max_items_before_downsampling'))"

Should print 100000000. If it prints None, you may need to check the correct Redis db number for your setup.

To verify queries use downsampled tables, check system.processes in ClickHouse - you should see queries going to eap_items_1_downsample_8_local instead of eap_items_1_local.

To revert:

docker compose exec redis redis-cli -n 1 HDEL snuba-config "OutcomesBasedRoutingStrategy.max_items_before_downsampling"

This only affects query routing (reads), not data ingestion. Config persists in Redis across container restarts.

nomarek avatar Dec 03 '25 09:12 nomarek

@nomarek Thank you so much, this really works and I can get 90 days of data without any errors! Hooray!

zKoz210 avatar Dec 03 '25 09:12 zKoz210

Thank you; I gave this a go, and it seems to improve the performance, but some of the graphs still don't load

Image

davidjmemmett avatar Dec 03 '25 10:12 davidjmemmett

I opted to go a little lower "10000000" which seems to allow 90 day graphs to also load in a reasonable time now. You can play around with the threshold for your own dataset / server. But awesome we found this, now to find a nice way to have this be configurable using the .env or something because this is a great fix.

stayallive avatar Dec 03 '25 14:12 stayallive

Found a fix for this. The issue is that Snuba has downsampled tables (eap_items_1_downsample_8_local, etc.) but the default threshold to use them is set to 1 billion rows, which is too high for most self-hosted instances.

Nice find. The craziest thing about this is that those dashboard views are running those heavy aggregation queries on-demand, and not on a MV or some other pre-computed aggregation.

gersmann avatar Dec 05 '25 15:12 gersmann

I think the latest release solved it for us (25.11.1)

gersmann avatar Dec 05 '25 19:12 gersmann