hasjob
hasjob copied to clipboard
Partition JobImpression and UserEvent tables
Hasjob's JobImpression table has become a bottleneck in production. It along with the UserEvent table is now the biggest consumer of disk space. From production (using query from this article):
hasjob=> SELECT relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 10;
relation | size
------------------------------------+---------
job_impression | 6426 MB
user_event | 6147 MB
ix_job_impression_event_session_id | 2823 MB
job_impression_pkey | 2681 MB
ix_job_impression_datetime | 1834 MB
pg_toast_89587 | 248 MB
event_session | 229 MB
job_application | 206 MB
user_event_pkey | 202 MB
pg_toast_89514 | 122 MB
(10 rows)
(user_event
would have been larger but is not indexed for now.)
PostgreSQL's documentation recommends using vertical partitioning in such timestamp-sensitive scenarios so that indexes are smaller and inserts are more efficient. This will be somewhat cumbersome to implement as we'll no longer have SQLAlchemy's elegant abstractions, but shouldn't be too hard to manage and will be more or less transparent from within code.
One consequence of using this mechanism is that (a) we can no longer use session_id
in a unique constraint as that may span partition borders and will slow down inserts, and (b) all read queries for a given session will need to add job_impression.datetime >= session.created_at AND job_impression.datetime <= COALESCE(session.ended_at, NOW())
so that the query planner limits which partitions are read from. Session sweeping as described in #221 becomes important now.
However, the JobImpression table is typically queried on the basis of jobpost_id
and not session_id
, so this makes partition-specific queries somewhat trickier. We could consider that since jobs are supposed to expire after 30 days and will not be impressed thereafter (except for the unpublicised archive mode), these queries could be bounded to between jobpost.created_at
and jobpost.datetime + interval '30 days'
(for a total period that may exceed 30 days depending on how long it took for the draft to be published, and depending on manual updates to the datetime
column as occasionally done for customer service).
Engine Yard has a nice blog post on PostgreSQL partitioning, including a trigger that creates child tables automatically.
One expectation with partitioning is that old data that is no longer needed can be exported to a backup and removed from the database. In our case we still read old data occasionally from permalinks, particularly for viewcounts. The only answer to that problem is having this data permanently cached in another table (rather than cached in redis).
In our case we still read old data occasionally from permalinks, particularly for viewcounts.
can we find out the maximum interval between 2 job impressions after a job has expired? As in, how long was a job in stale mode before someone opened it with permalink again. And how often does that occur for, say, more than 90 days interval, giving 2 months of grace period once a job has expired. if it's not that frequent, we could move those records to a separate table periodically. and while querying by permalink, if job expiry date is older than 3 months, we check in that other table for impression data and not always.
Also, how important is viewcount for a job that's been expired for more than 3 months? Do we really need to show that at the cost of query time? Until we figure out a nice way to deal with this, we could disable showing view counts for those jobs. Every job view must be suffering from querying this large a table.
The pg_partman extension for PostgreSQL moves all the hard logic into PostgreSQL itself. We should use it.
current stat -
relation | size
------------------------------------+---------
user_event | 13 GB
job_impression | 8000 MB
job_impression_pkey | 4930 MB
ix_job_impression_event_session_id | 4404 MB
ix_job_impression_datetime | 4139 MB
event_session | 760 MB
user_event_pkey | 450 MB
job_application | 324 MB
anon_user | 321 MB
pg_toast_89587 | 319 MB
(10 rows)