django-auditlog icon indicating copy to clipboard operation
django-auditlog copied to clipboard

Schema improvements for V3?

Open aleh-rymasheuski opened this issue 2 years ago • 1 comments

@hramezani, @aqeelat, I reviewed the LogEntry model to check where we can speed up the auditlog and I've got a few suggestions:

  1. Drop object_id field. Rationale: object_pk is a superset of this field, it holds the string representation of the int primary key or the primary key if it's not int (object_id is blank in the latter case). Both are indexed, so we have twice the bloat from having two columns.
  2. Remove db_index from action field. Rationale: this field only has three distinct values, so the selectivity of this index is really low. I don't expect the performance of seq scan to be much worse than the performance of an index scan for a three-value field (measurement needed).
  3. Replace db_index on timestamp column to a composite index by timestamp and id fields. Rationale: Django admin interprets order by timestamp as order by timestamp, id because timestamp is not a unique field and we need complete sort order on the admin. Composite index on the two fields will be bulkier, but will enable both searches by timestamp and by timestamp then id fields.

These changes (particularly, the removal of the object_id field) will make the migration to V3 harder for the users but will improve the performance to a certain degree.

aleh-rymasheuski avatar Sep 01 '23 16:09 aleh-rymasheuski

Thanks @aleh-rymasheuski for investigating and preparing this issue.

  1. Drop object_id field. Rationale: object_pk is a superset of this field, it holds the string representation of the int primary key or the primary key if it's not int (object_id is blank in the latter case). Both are indexed, so we have twice the bloat from having two columns.

I think we already introduced a lot of breaking changes for V3. One other option would be to deprecate object_id diring V3 and remove it in V4.

  1. Remove db_index from action field. Rationale: this field only has three distinct values, so the selectivity of this index is really low. I don't expect the performance of seq scan to be much worse than the performance of an index scan for a three-value field (measurement needed).

I am ok to keep this index. but as you mentioned we need to measure it. then we can make a better decision.

  1. Replace db_index on timestamp column to a composite index by timestamp and id fields. Rationale: Django admin interprets order by timestamp as order by timestamp, id because timestamp is not a unique field and we need complete sort order on the admin. Composite index on the two fields will be bulkier, but will enable both searches by timestamp and by timestamp then id fields.

Seems a reasonable change. We need to know how long the migration takes for this.

hramezani avatar Sep 01 '23 17:09 hramezani