dogsheep-beta
dogsheep-beta copied to clipboard
Figure out incremental re-indexing
As tables get bigger reindexing everything on a schedule (essentially recreating the entire index from scratch) will start to become a performance bottleneck.
I thought about allowing tables to define a incremental indexing SQL query - maybe something that can return just records touched in the past hour, or records since a recorded "last indexed record" value.
The problem with this is deletes - if you delete a record, how does the indexer know to remove it? See #18 - that's already caused problems.
A really clever way to do this would be with triggers. The indexer script would add triggers to each of the database tables that it is indexing - each in their own database.
Those triggers would then maintain a _index_queue_
table. This table would record the primary key of rows that are added, modified or deleted. The indexer could then work by reading through the _index_queue_
table, re-indexing (or deleting) just the primary keys listed there, and then emptying the queue once it has finished.
This would add a small amount of overhead to insert/update/delete queries run against the table. My hunch is that the overhead would be miniscule, but I could still allow people to opt-out for tables that are so high traffic that this would matter.