redis-memo icon indicating copy to clipboard operation
redis-memo copied to clipboard

Support join queries

Open katyho opened this issue 2 years ago • 2 comments

  • https://github.com/chanzuckerberg/redis-memo/pull/6 drops support for join queries because they could be unbound.

katyho avatar Apr 08 '22 15:04 katyho

The queries we're looking at here would have the following format.

select … from A join … where …

the query can select from a set of fields, and the rows returned are all from some table (suppose it's called A)

Case Work

Then the queries will have 2 general forms:

  1. equality conditions on all tables: A.x = B.x and A.a = ? and B.b = ?
  2. missing equality conditions (unbound queries) on some/all tables: A.x = B.x, A.x = B.x and A.a = ?

Case 1

If all the relevant tables have equality filtering conditions on them, we can cache the query currently already with the current design.

For example, consider a filtering condition A.x = B.x and A.a = ? and B.b = ?. Suppose A.x, A.a, B.x, B.b are memoized_table_columns.

Then any records in the final result set will depends on the set of memorizables (A.x, A.a, B.x, B.b) which will be handled by the version bumping / invalidation flow.

Case 2

For tables with equality condition, when the records changes, cached queries get invalidated

But for tables do not have equality condition in the query, we might have problem because if those records change, the cached queries would not get invalidated automatically. - this could be okay though if the select clauses never changes (e.g. PK, or immutable fields)

To actually solve the problem, here are two ideas and hopefully will inspire some more thoughts.

Idea 1: Add unbound tables as table-dependency

When a record change on an unbound table changes, the table-dependency also changes, hence the cached queries get invalidated - Con: If the table changes a lot, cache hit rate will suffer

The cache hit rate is more likely to suffer since the cache queries need to invalidate when any of the table changes.

Also not this idea is not practical when unbound tables are tables like Session / Post / Comment that can be updated very frequently.

Idea 2: Post fetch validation

Current design:

  • extract a set of memoizables by inspecting the query
  • [slow] A trip to redis to get their versions
  • compute a cache key
  • [slow] A trip to redis to get the cached query
    • cache hit:
      • use the result
    • cache miss:
  • get a record set from the actual data source (i.e.g Postgres)
    • use that
    • fill the cache under that cache key

Alternative:

  • extract a set of memoizables by inspecting the query
  • [slow] A trip to redis to get their versions
  • compute a cache key
  • [slow] A trip to redis to get the cached query
    • cache hit:
      • Extract another set of memoizables by inspecting the result set
      • [slow] A trip to redis to get their versions
      • Compute another_cache_key
      • [slow] A trip to redis to check if another_cache_key exists
  • yes: use the result
  • no: create another_cache_key, go to cache miss
    • cache miss:
  • get a record set from the actual data source (i.e.g Postgres)
    • use that
    • fill the cache under that cache key (if there's not too many records)

Pro: Potentially higher cache hit rate Con: Slower comparing to [idea 1] on the cache hit path

donaldong avatar May 06 '22 01:05 donaldong

I'm not sure I understand the post-fetch validation approach. What is the 2nd set of memoizables supposed to be checking, or what are those dependencies/how do they differ from the first set?

katyho avatar May 06 '22 21:05 katyho