umami icon indicating copy to clipboard operation
umami copied to clipboard

Postgres Migration 5 hangs seemingly forever

Open bjornpijnacker opened this issue 1 year ago • 1 comments

Describe the Bug

Postgres migration 5 is running for a long time on my database (>1h). At first it used ~70% of node CPU for approximately half an hour but after more than one hour it hasn't completed yet.

The migration first creates a new column with a quite inefficient subquery. I assume this is where the CPU load comes from. Then it creates indices. It is stuck somewhere in between here as CPU load has gone down to regular levels but select * from pg_stat_progress_create_index; is empty.

The website_event table has ~10M records.

Database

PostgreSQL

Relevant log output

Output of:
SELECT datname, pid, state, query, age(clock_timestamp(), query_start)
FROM pg_stat_activity
where datname = 'umami'
  AND state <> 'idle'
  AND query NOT LIKE '% FROM pg_stat_activity %';

---

umami,24992,active,"-- AlterTable
ALTER TABLE ""website_event"" ADD COLUMN ""visit_id"" UUID NULL;

UPDATE ""website_event"" we
SET visit_id = a.uuid
FROM (SELECT DISTINCT
        s.session_id,
        s.visit_time,
        gen_random_uuid() uuid
    FROM (SELECT DISTINCT session_id,
            date_trunc('hour', created_at) visit_time
        FROM ""website_event"") s) a
WHERE we.session_id = a.session_id 
    and date_trunc('hour', we.created_at) = a.visit_time;

ALTER TABLE ""website_event"" ALTER COLUMN ""visit_id"" SET NOT NULL;

-- CreateIndex
CREATE INDEX ""website_event_visit_id_idx"" ON ""website_event""(""visit_id"");

-- CreateIndex
CREATE INDEX ""website_event_website_id_visit_id_created_at_idx"" ON ""website_event""(""website_id"", ""visit_id"", ""created_at"");",0 years 0 mons 0 days 1 hours 16 mins 29.793777 secs

Which Umami version are you using? (if relevant)

Upgrading from 2.8.0 to 2.13.2

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

Azure AKS and Azure Postgresql DB

bjornpijnacker avatar Sep 03 '24 13:09 bjornpijnacker

After a few more hours the query finally finished. This also required upping IOPS in Azure from 120 to 3750 for the query not to hit the IOPS limit. Considering that it still took many hours it would have taken multiple days to finish with the (default for 32GiB) 120 IOPS limit. This is using Azure Postgres Database with a Standard D2ds_v5 node which is not particularly cheap either.

I think that the migration queries should be optimized, as this query doesn't feel particularly efficient. Using, for instance, temporary tables (or materialized views) instead of expensive subqueries may help. It might also be a good idea to test the queries for performance as 10 million records is to be expected of Umami deployments which run multiple years, and to note in the release notes when a query will run long so we can prepare production deployments as such.

I'll leave the issue open so maintainers can decide if any further steps may be suitable.

bjornpijnacker avatar Sep 03 '24 18:09 bjornpijnacker

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Nov 03 '24 02:11 github-actions[bot]

Presumably still a problem.

markerikson avatar Nov 03 '24 02:11 markerikson

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Jan 04 '25 01:01 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Jan 11 '25 02:01 github-actions[bot]

Still a problem, can this be reopened?

bjornpijnacker avatar Mar 10 '25 09:03 bjornpijnacker