dogsheep-beta icon indicating copy to clipboard operation
dogsheep-beta copied to clipboard

Figure out incremental re-indexing

Open simonw opened this issue 4 years ago • 2 comments

As tables get bigger reindexing everything on a schedule (essentially recreating the entire index from scratch) will start to become a performance bottleneck.

simonw avatar Sep 08 '20 05:09 simonw

I thought about allowing tables to define a incremental indexing SQL query - maybe something that can return just records touched in the past hour, or records since a recorded "last indexed record" value.

The problem with this is deletes - if you delete a record, how does the indexer know to remove it? See #18 - that's already caused problems.

simonw avatar Sep 08 '20 05:09 simonw

A really clever way to do this would be with triggers. The indexer script would add triggers to each of the database tables that it is indexing - each in their own database.

Those triggers would then maintain a _index_queue_ table. This table would record the primary key of rows that are added, modified or deleted. The indexer could then work by reading through the _index_queue_ table, re-indexing (or deleting) just the primary keys listed there, and then emptying the queue once it has finished.

This would add a small amount of overhead to insert/update/delete queries run against the table. My hunch is that the overhead would be miniscule, but I could still allow people to opt-out for tables that are so high traffic that this would matter.

simonw avatar Sep 08 '20 05:09 simonw