risingwave
risingwave copied to clipboard
metrics(compaction): Huge compaction cancelled count showed in grafana
Describe the bug
I saw a huge number of cancelled compaction task when running nexmark_q7, which looks abonormal. However, source throughput is not affected. If this is expected, I think we should have a separate panel for cancelled compaction task to avoid missing it up with success/ailed compaction task.
To Reproduce
- Use risedev-bench to start a 3-cn cluster.
- Create sources:
CREATE SOURCE person (
"id" BIGINT,
"name" VARCHAR,
"email_address" VARCHAR,
"credit_card" VARCHAR,
"city" VARCHAR,
"state" VARCHAR,
"date_time" TIMESTAMP,
"extra" VARCHAR
) with (
connector = 'nexmark',
nexmark.table.type = 'Person',
nexmark.split.num = '8',
nexmark.min.event.gap.in.ns = '0'
) ROW FORMAT JSON;
CREATE SOURCE auction (
"id" BIGINT,
"item_name" VARCHAR,
"description" VARCHAR,
"initial_bid" BIGINT,
"reserve" BIGINT,
"date_time" TIMESTAMP,
"expires" TIMESTAMP,
"seller" BIGINT,
"category" BIGINT,
"extra" VARCHAR
) with (
connector = 'nexmark',
nexmark.table.type = 'Auction',
nexmark.split.num = '8',
nexmark.min.event.gap.in.ns = '0'
) ROW FORMAT JSON;
CREATE SOURCE bid (
"auction" BIGINT,
"bidder" BIGINT,
"price" BIGINT,
"channel" VARCHAR,
"url" VARCHAR,
"date_time" TIMESTAMP,
"extra" VARCHAR
) with (
connector = 'nexmark',
nexmark.table.type = 'Bid',
nexmark.split.num = '8',
nexmark.min.event.gap.in.ns = '0'
) ROW FORMAT JSON;
- Create MView:
CREATE MATERIALIZED VIEW nexmark_q7
AS
SELECT
B.auction,
B.price,
B.bidder,
B.date_time
from
bid B
JOIN (
SELECT
MAX(price) AS maxprice,
window_end as date_time
FROM
TUMBLE(bid, date_time, INTERVAL '10' SECOND)
GROUP BY
window_end
) B1 ON B.price = B1.maxprice
WHERE
B.date_time BETWEEN B1.date_time - INTERVAL '10' SECOND
AND B1.date_time;
- Wait for 20mins
Expected behavior
No response
Additional context
No response
I think the cancellation is caused by no idle compactor, which is expected.
I think the cancellation is caused by no idle compactor, which is expected.
I see. However, repeatedly triggering compaction and then cancelling it is kind of a waste of work if we don't have idle compactor. I think #5321 can help to reduce the cancellation count.
I feel like we need to seperate the report label (TaskStatus)to distinguish the failure and cancelation in grafana. This helps us to find problems more directly. (#5430)
after #5405

Indeed, the reported issue has been fixed. But we can observe that the number of l0 sst is large, and duration of each compact is about 10s. IMO, the default config of no_available_compactor_stall_sec
is a little longer.
revert the change and bench again

Unfortunately, we found no significant change in l0 sst count
after #5405
![]()
Indeed, the reported issue has been fixed. But we can observe that the number of l0 sst is large, and duration of each compact is about 10s. IMO, the default config of
no_available_compactor_stall_sec
is a little longer.revert the change and bench again
![]()
Unfortunately, we found no significant change in l0 sst count
Some other observations:
-
Compaction Throughput
has increased after revert. It's strange that throughput is higher but SST count is similar. -
Compactor Running Task Count
is much more stable at the maximum limit before this PR. So yes, 10s might be a tad too long.
Though I haven't paid special attention to the dashboard, but I suspect after #5405, the CPU usage of meta node should be lower when there's no idle compactor, and the log has certainly become much tidier 🤣 .
After resolving #5321, I would expect zero or little tasks canceled because of no idle compactor depending on the implementation, and a stable running task count on the compactor, constantly hitting the maximum limit.
I ran a same bench as Li, with both f903d4a6, and an earlier commit f1a84e5d that excludes recent refactor on compaction scheduler(no overwhelming cancellation problem). The L0 file number piles up in both cases, though f903d4a6 is worse, with almost doubled L0 file number. Note that there are several other commits between f903d4a6 and f1a84e5d that can affect compaction efficiency.
./bin/bench --type=tpch-std --qps=800000 --scale 80.0 --query 20 --partition 12 \
$(cat ~/risingwave-deploy/tpch-bench-args-kafka) \
$(cat ~/risingwave-deploy/tpch-bench-args-frontend) \
--prepare-sql ./assets/data/create_materialized_source.sql
Result of f1a84e5d (earlier commit)
Result of f903d4a6 (no_available_compactor_stall_sec=5)
~~I wonder if the L0 file piling up is caused by CPU contention, as we deploy compactor and source in the same EC2 instance, and its CPU usage is close to 100%.~~
~~Will confirm later.~~
f903d4a6 (no_available_compactor_stall_sec=5), and each compactor deployed in dedicated 16vCPU instance.
with 1 compactor, the L0 file number still piles up. But the max number of L0 file is smaller.
with 2 compactor, the L0 file number looks good now. But source throughput is lower.
Full metrics
Note that there are several other commits between base version f1a84e5d and current version f903d4a6 that can affect compaction behavior. So we compare 1) f1a84e5d with 2) f1a84e5d + cherry pick 3afa7cfb and f903d4a6 (no_available_compactor_stall_sec=5). The result shows max L0 file number from both test are 600~800. So I think 3afa7cfb + f903d4a6 itself doesn't introduce regression.
But source throughput is lower.
What could be the cause of this? Cache invalidation?
Seems that after a while, the throughput goes back up?