gotosocial
gotosocial copied to clipboard
[performance] Experiment to see if it's possible to optimize database queries using views
Any database query that has joins between tables might be a good candidate for a database view.
A database view is like a read-only de-normalized table that gets automatically updated when one of the tables that the view is generated from changes.
Here is an example for notifications:
Tobi described the process of selecting notifications like this:
https://github.com/superseriousbusiness/gotosocial/blob/main/internal/db/bundb/notification.go#L57-L108
[paraphrasing the linked code:
select * from notifications where target_account_id = ? and id > ? and id < ?
]
the table indexes on id, but not on target_account_id, so that would probably help i guess it's also just a bit slow because serializing a notification takes quite a few calls to the db as well you need to fetch the owning account, the target account, and the status the notification pertains to (if it's a status)
Forest sez:
This sounds insane to me. IMO the serialization of the entities should be stateless "pure function", the input to serialization should be DB rows and the output should be JSON or objects or whatever.
The problem is twofold:
- its doing a table scan; the where clause includes
target_account_id
but there's no index alongtarget_account_id
on the notifications table. - we need information from the
accounts
table in order to display the notification (${owning_account.Username} liked your post!
) and that information is not on the notification row
- if it was, that'd mean the same information is written twice in different places, or "de-normalized"
- de-normalizing by hand can bring problems to developers. for example, what happens if someone changes their username? do we have to update every notification related to them now?
We can fix part 1 by simply adding an index along target_account_id, id
to the notifications table. But we can hit both birds with 1 stone with a view.
A view is a nice way to de-normalize because you can think of it as a glorified index, it does not need to be updated by the programmer, it should be updated automatically any time one of its underlying tables is changed.
https://www.tutorialspoint.com/sqlite/sqlite_views.htm
A view is also nice because views can be created, deleted, and updated without interfering with the data/schema in the tables at all, they are derived by the database engine from the view query + the data in the tables.
The query used to generate the view would look something like this:
select
notification.id
notification.target_account_id,
notification.owning_account_id,
owning_account.username, owning_account.etc, # include all rows needed to display the notification.
notification.etc # include all rows needed to display the notification.
from notification
join accounts as owning_account on notification.owning_account_id = owning_account.id
join accounts as target_account on notification.target_account_id = target_account.id
PRIMARY KEY [notification.target_account_id, notification.id]
this way when we run
select * from notification_view where target_account_id = ? and id > ? and id < ?
that where clause will map to a single region of bytes stored on disk in the notification_view
the the returned rows will have everything needed to serialize the notification
well, except for the status details. But it should be fine to do another request for the status details.
It will probably be better/faster to:
- grab all the notification_view rows, then
- make a list of the status IDs that you need, then
- send a single query to get all those statuses:
select * from status where (id = ? or id = ? or id = ? or id = ? or id = ? or ...)
or maybe
select * from status where id in ?
where ? is a list of IDs
Alternatively if the info needed from the status is not very much / not very large, it might be better to simply add it to the notification view:
select
notification.id
notification.target_account_id,
notification.owning_account_id,
owning_account.username, owning_account.etc, # include all account rows needed to display the notification.
notification.etc, # include all rows needed to display the notification.
status.url, status.short_preview_text # include all status rows needed to display the notification.
from notification
join accounts as owning_account on notification.owning_account_id = owning_account.id
join accounts as target_account on notification.target_account_id = target_account.id
left outer join status on notification.status_id = status.id
PRIMARY KEY [notification.target_account_id, notification.id]
IIRC the left outer join means that if notification.status_id
is null, it wont omit that notification row, it will just select null for the rows on the status table.
For now I managed to resolve some of the issues with database slowness by adding indexes in appropriate places (see #419), so that's part 1 of the fix. Still didn't experiment with views though so best to leave this issue open and come back to it if/when we need it.
I believe you're thinking of materialized views.
SQL light does not support materialized views, only "virtual" views which resolved to normal SQL on the base tables.
https://techdifferences.com/difference-between-view-and-materialized-view.html
Since we haven't really needed to do this, and proper indexing solved a lot of issues we were having, i'm gonna close this (just doing some spring cleaning!)