starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

The same sql in asynchronous materialized view in low performance

Open Guosmilesmile opened this issue 1 year ago • 0 comments

when constructing an asynchronous materialized view with Iceberg as an external table and updating partitions manually, the performance and efficiency are much slower than direct queries. From the logs, it can be seen that during the query, the partition information is not pushed down into the query as a filter condition, which results in low performance. Is this in line with expectations?

log: [SnapshotScan.planFiles():124] Scanning table iceb erg.iceberg_db.table snapshot 4672151414010627259 created at 2024-04-19T11:58:11.902+00:00 with filter mtimest amp > (16-digit-int)

direct sql:

select
date_trunc(“HOUR”,mtimestamp) as ts,
time_slice(mtimestamp, INTERVAL 5 second) as second_ts,
id,
HLL_UNION_AGG(hll_hash(aid)) as num
from
iceberg.iceberg_db.test
where
mtimestamp >'2024-04-18 00:00:00’and
mtimestamp <‘2024-04-18 01:00:00’
group by
time_slice(mtimestamp, INTERVAL 5 second),
date_trunc(“HOUR”,mtimestamp),
id;

materialized view sql:

CREATE MATERIALIZED VIEW test_view
REFRESH DEFERRED MANUAL
PARTITION BY ts
PROPERTIES
(
“partition_refresh_number”=“1”,
“replication_num”=“3”,
“query_rewrite_consistency”=“disable”,
“session.enable_spill”=“true”
)
AS
select
date_trunc(“HOUR”,mtimestamp) as ts,
time_slice(mtimestamp, INTERVAL 5 second) as second_ts,
id,
HLL_UNION_AGG(hll_hash(aid)) as num
from
iceberg.iceberg_db.test
where mtimestamp >'2024-04-18 00:00:00’and
group by
time_slice(mtimestamp, INTERVAL 5 second),
date_trunc(“HOUR”,mtimestamp),
id;

REFRESH MATERIALIZED VIEW test_view PARTITION START (“2024-04-19 00:00:00”) END (“2024-04-19 01:00:00”) ;

StarRocks version (Required)

  • 3.2.4 or 3.2.6

Guosmilesmile avatar Apr 20 '24 08:04 Guosmilesmile