Feature: "watched_columns" speedup in more 10000 times for millions requests
First of all, thank you for this amazing project — it has been incredibly useful! 🙌
In our high-load system, we handle millions of updates per day, and during peak times Redis queues can contain hundreds of thousands of payloads. A significant portion of these updates turned out to be redundant — especially in two cases: • Many-to-many or relation tables frequently updated with the same data (no real value change). • Large tables where only a few columns are relevant for the OpenSearch document, but updates to any column currently trigger a full document recalculation.
I introduced a simple improvement that helps avoid unnecessary re-indexing and can significantly boost performance in similar high-throughput environments: • You can now define a watched_columns list inside the document model. Only updates to these columns will trigger a document refresh in OpenSearch. • Previously, if a table had 20 columns but only 2 of them were defined in columns, any update to the remaining 18 still caused a full reindex — even when the relevant data didn’t change. • With watched_columns, this overhead is eliminated.
In our setup, this optimization reduced redundant updates by several orders of magnitude — even under peak load, the Redis queue rarely exceeds 10 messages. It works especially well for frequent re-updates in M2M tables.
Backward compatibility • Default behavior remains unchanged. • If watched_columns is not defined, all updates will continue to trigger document recalculations as before.
Tests • Added.
Docs • README.rst: updated. • HISTORY.rst: updated.
@toluaina Good morning Sir. Waiting for your review ☀️, hope this feature will be useful for high load systems.
Apologies for the delay. I needed some time to review this. (I've spent a bit of time in recent times fixing bugs with exclusions). Excluding the watched columns in this way has some implications. For example, with through tables you wouldn’t receive updates and could miss records.
Your approach gave me an idea, so I’ve started a feature branch to explore it. Could you take a look and share your thoughts?
Good evening! Thank you for answering.
For example, with through tables you wouldn’t receive updates and could miss records.
We are skipping only on UPDATE operation. In common cases m2m tables for creating use INSERT and for removing DELETE. So even this table will contain any specific info like:
| user_id (FK) | role_id (FK) | expires_at |
|---|---|---|
| 1 | 2 | 23-12-2025 |
| 1 | 3 | 23-12-2025 |
| 2 | 3 | 20-12-2025 |
Or
| id (PK) | user_id (FK) | role_id (FK) | expires_at |
|---|---|---|---|
| 1 | 1 | 2 | 23-12-2025 |
| 2 | 1 | 3 | 23-12-2025 |
| 3 | 2 | 3 | 20-12-2025 |
We can set watched_columns with ["user_id", "role_id", "expires_at"] and then we will push update to redis, pull it and update opensearch document only on real update with data changing. Or we can not set watched_columns and all will be work as before.
Maybe I missed some, but in our cases all okey with through tables.
Could you take a look and share your thoughts?
Sure 👍🏻
@toluaina Good evening. I looked at your idea – it's very unusual. I didn't think something like this could be done with practically pure SQL. On the one hand, it's very cool and could work faster, but on the other hand, processing logic in Python feels more familiar to me. It also seems that in the current SQL version, if the user has not specified anything in the columns, then any update will be ignored/skipped, which may be a little non-obvious to the average user. plus something like this comes to mind:
WHERE n.key = ANY(_columns || _foreign_keys || _primary_keys)
Otherwise, foreign keys will be skipped unless the user explicitly specifies them. although of course I could be wrong.
in my case with python way skipping, message NEW and OLD always PK, FK, and watched_columns (if set). Therefore, the user doesn't need to specify primary or foreign keys in this attribute; it's enough to specify additional fields that we look at during processing. Less thinking for the user means less likelihood of unexpected behavior.
@TerionGVS5 That’s an interesting approach! I implemented something similar in this PR , but in my case, I excluded these columns from database logs directly from the trigger.
From what I understand in your PR, you’re handling the filtering at the log level instead—did I get that right?
@DrRiyadh Greetings, something like that, yes. I tried not to change the existing logic too much, but simply expand it. It was also easier to see how everything worked. For example: making an update on the front end that doesn't actually change data, or changes data that doesn't interest us. And then checking the logs to see that this change was recorded by us, but was allowed through according to our rules.