2.10.x Author/Sysadmin dataset editing slow due to missing ordered index's
CKAN version
CKAN 2.10.1
Describe the bug
On legacy large CKAN instances (www.data.qld.gov.au / www.publications.qld.gov.au), when a new dataset is created and resources are being added, or adding resources to a dataset that does not have many activity/tracking_summary details. Page load times are very slow for 'author/sysadmin' users.
Steps to reproduce
Have a very large database with lots of data in the tracking_summary "9,436,568 rows" and activity "2,131,004 rows" tables that is unrelated to the dataset that is about to be edited.
Then attempt to update various datasets including uploading a new document. As well as create a new dataset then edit it right after.
Page will take longer than 180 seconds to load. With our setup, the CDN will give up waiting for the system and return HTTP 502.
Expected behavior
Normal dataset/resource updating/adding without waiting more than the upload time for the file. Or under 3 seconds for url resource creation.
Additional details
When reviewing the performance metrics on the database, it was noticed that it took 27seconds to return this query:
SELECT tracking_summary.url AS tracking_summary_url, tracking_summary.package_id AS tracking_summary_package_id, tracking_summary.tracking_type AS tracking_summary_tracking_type, tracking_summary.count AS tracking_summary_count, tracking_summary.running_total AS tracking_summary_running_total, tracking_summary.recent_views AS tracking_summary_recent_views, tracking_summary.tracking_date AS tracking_summary_tracking_date
FROM tracking_summary
WHERE tracking_summary.package_id = ':package_id:' ORDER BY tracking_date desc
LIMIT 1
AND 1.27 calls / sec, 7.26 avg latency (ms)/call.
SELECT activity.id AS activity_id, activity.timestamp AS activity_timestamp, activity.user_id AS activity_user_id, activity.object_id AS activity_object_id, activity.revision_id AS activity_revision_id, activity.activity_type AS activity_activity_type, activity.data AS activity_data
FROM activity LEFT OUTER JOIN package ON package.id = activity.object_id AND package.private = false
WHERE (package.owner_org = ':packageOwnerOrg:' OR activity.object_id = ':ObjectID:') AND (activity.user_id NOT IN (':UserID:')) ORDER BY activity.timestamp DESC
LIMIT 32
When the following indexes were manually created, the backend author views became performant again.
CREATE INDEX IF NOT EXISTS activity_object_id_user_id_order_by_timestamp_desc
ON public.activity USING btree
(user_id COLLATE pg_catalog."default" ASC NULLS LAST, object_id COLLATE pg_catalog."default" ASC NULLS LAST, "timestamp" DESC NULLS FIRST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS tracking_summary_package_id_orderby_tracking_date_desc
ON public.tracking_summary USING btree
(package_id COLLATE pg_catalog."default" ASC NULLS LAST, tracking_date DESC NULLS FIRST)
TABLESPACE pg_default;
It was noted that there are indeed indexes on these tables, but they are set to ASC mode and may not being used and full table scan's are occuring.
i.e.
CREATE INDEX IF NOT EXISTS idx_activity_object_id
ON public.activity USING btree
(object_id COLLATE pg_catalog."default" ASC NULLS LAST, "timestamp" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS idx_activity_user_id
ON public.activity USING btree
(user_id COLLATE pg_catalog."default" ASC NULLS LAST, "timestamp" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS tracking_summary_package_id
ON public.tracking_summary USING btree
(package_id COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
There's 2 halves here, tracking and activities.
The tracking_summary table looks like it's only useful to be storing approx 14 days of data, but we don't seem to have anything to compress/clear that out. Given the O(30sec) time of that query though, I'm wondering if you're also seeing performance problems on the tracking update commands.
I'm a little surprised that the index isn't being used when you're selecting out one package id, and then doing the ordering using the multicolumn index. Unfortunately, I don't have a large tracking table to check that on. I'd be interested to see what the explain analyze looks like in the base case for that query with enable_seqscan = off.
(also, note that the ASC NULLS LAST and DESC NULLS FIRST are the default null configuration for ordering: https://www.postgresql.org/docs/current/indexes-ordering.html)
The activity id looks like something I was looking at in https://github.com/ckan/ckan/pull/7878#issuecomment-1803828382. Does your explain analyze look similar to those?
explain analyze SELECT tracking_summary.url AS tracking_summary_url, tracking_summary.package_id AS tracking_summary_package_id, tracking_summary.tracking_type AS tracking_summary_tracking_type, tracking_summary.count AS tracking_summary_count, tracking_summary.running_total AS tracking_summary_running_total, tracking_summary.recent_views AS tracking_summary_recent_views, tracking_summary.tracking_date AS tracking_summary_tracking_date
FROM tracking_summary
WHERE tracking_summary.package_id = ':package_id:' ORDER BY tracking_date desc
LIMIT 1
returns
"Limit (cost=0.56..4.22 rows=1 width=115) (actual time=1.839..1.839 rows=0 loops=1)"
" -> Index Scan using tracking_summary_package_id_orderby_tracking_date_desc on tracking_summary (cost=0.56..2029.55 rows=555 width=115) (actual time=1.838..1.839 rows=0 loops=1)"
" Index Cond: (package_id = ':package_id:'::text)"
"Planning Time: 0.313 ms"
"Execution Time: 1.854 ms"