ckan icon indicating copy to clipboard operation
ckan copied to clipboard

ckanext-activity: performance improvements

Open duttonw opened this issue 1 year ago • 0 comments

Fixes #

#7953

Problem: Dashboards take over 180seconds to load for a logged in normal user, emailing takes over 60minutes for less than 60 users getting notifications due to: high cpu/disk utilization on postgres on inefficient data heavy queries.

Steps to reproduce: Have a couple of datasets which have been updating every 15min for the last 10 years creating 'activity' diff's. Have about 2.1million rows in activity table with data blob field fully populated.

Create default user and follow datasets (or the org). Go and visit dashboard and see how long it takes to load. Go and trigger email notifications on updated dataset/resources.

example org activity stream: https://www.data.qld.gov.au/organization/environment-science-and-innovation

Proposed fixes:

Speed up activity stream loading

ckanext-activity plugin:

  • Skip or lazy-load heavy CLOB column
  • Use subselect instead of union for more effective indexing/querying

These changes have been deployed and tested in www.data.qld.gov.au under commit https://github.com/qld-gov-au/ckan/commit/88d932e280204c338933f1cb502ab5aa86c7914f which is on top of ckan 2.10.4 . This has allowed us to reenable email notifications hourly as this was too much of a database cpu+disk and batch layer cron overlaps since we moved from 2.9 to 2.10 and the activity table 'data' column started to be fleshed out with json blob data of the point in time history.

Features:

  • [ ] includes tests covering changes
  • [ ] includes updated documentation
  • [ ] includes user-visible changes
  • [ ] includes API changes
  • [X] includes bugfix for possible backport

Please [X] all the boxes above that apply

Co-Author: @ThrawnCA

duttonw avatar Apr 08 '24 03:04 duttonw