pg_search icon indicating copy to clipboard operation
pg_search copied to clipboard

Allow for further filtering of subquery

Open m5rk opened this issue 1 year ago • 12 comments

Source:

https://github.com/dwillett/pg_search/commit/c6418b5d4c6a6566aff2c854ad39b11d794d29a6

From: https://github.com/Casecommons/pg_search/issues/292

The problem seems to be that the notion of additional_attributes is not really elaborated enough. I suspect in most cases, it's to facilitate segregating tenants, like here:

https://www.donnfelker.com/multitenant-pgsearch-how-to/

But if they wonder why they have terrible performance and look at the explain plan, they'll see why: The subquery ignores the opportunity to restrict the search to the tenant.

m5rk avatar Apr 02 '23 21:04 m5rk

Wow, using you're fork + using the same filtering condition that is used outside the inner query, I get a speedup of ca. 10x for a troublesome search!

Amazing :sparkles:

Zinggi avatar Apr 06 '23 09:04 Zinggi

This is interesting! We need to get some spec coverage on this to make sure it works as intended. I'm curious to dig in more.

nertzy avatar Apr 07 '23 23:04 nertzy

This is interesting! We need to get some spec coverage on this to make sure it works as intended. I'm curious to dig in more.

I'm happy to dig in as well. The only thing I can think of is something along these lines:

context 'when a block is given' do
  it 'has access to the subquery' do
    # Examine the content of the explain plan or .to_sql
  end
end

Update: @nertzy See my latest commit, which adds coverage that illustrates how the block is used.

m5rk avatar Apr 08 '23 14:04 m5rk

Assuming the code and specs are good, it seems like we may also want to update the docs to mention this capability. But I didn't want to spend time doing that until I get some confirmation that this approach (all credit to @dwillett) is approved by the maintainers.

m5rk avatar Apr 08 '23 16:04 m5rk

@m5rk, do you have any updates on this? We also use the pg_search gem in a multi-tenant environment. However, with the growing number of records across multiple tenants, we are experiencing performance issues due to the inability to scope the subquery.

LuukvH avatar Jun 28 '23 10:06 LuukvH

@m5rk, do you have any updates on this? We also use the pg_search gem in a multi-tenant environment. However, with the growing number of records across multiple tenants, we are experiencing performance issues due to the inability to scope the subquery.

No update on my end, sorry. I'm just waiting for the maintainers to give feedback or guidance on what's missing or to merge and then publish a new gem version. @nertzy, thoughts?

m5rk avatar Jun 28 '23 13:06 m5rk

@nertzy Another option could be to remove the unscoped on the subquery which was added in version 1.1 it seems to remove unnecessary joins. Which in my opinion is a concern to be handled by the developer and not this gem. This would also allow this gem to work with tenant gems that uses default scopes out of the box.

LuukvH avatar Jun 28 '23 21:06 LuukvH

Sorry for another "bumb" here, but is there something that's missing here for this to be merged? Maybe I could help out somehow?

The PR looks pretty good to me, and we've been using this branch in production since about half a year without issue. Without the ability to filter on the sub query, we couldn't use this gem at all for performance reasons...

Zinggi avatar Oct 02 '23 11:10 Zinggi

@nertzy Any thoughts? I added coverage as you requested. As I noted earlier, maybe the only remaining item is to update the documentation?

m5rk avatar Oct 02 '23 15:10 m5rk

If anyone wants a drop-in fix. This worked for us.

Drop this monkey patch somewhere (e.g. initializers in rails)

It basically works by scoping the inner query by the scope that goes in before you call pg_search_all.

# frozen_string_literal: true

# Monkey patch that adds scope pushing down conditions to subselect
module PgSearch
  class ScopeOptions
    def apply(scope)
      scope = include_table_aliasing_for_rank(scope)
      rank_table_alias = scope.pg_search_rank_table_alias(include_counter: true)

      scope
        .joins(rank_join(rank_table_alias, scope))
        .order(Arel.sql("#{rank_table_alias}.rank DESC, #{order_within_rank}"))
        .extend(WithPgSearchRank)
        .extend(WithPgSearchHighlight[feature_for(:tsearch)])
    end

    def rank_join(rank_table_alias, scope)
      "INNER JOIN (#{subquery(scope).to_sql}) AS #{rank_table_alias} ON #{primary_key} = #{rank_table_alias}.pg_search_id"
    end

    def subquery(scope)
      scope
        .select("#{primary_key} AS pg_search_id")
        .select("#{rank} AS rank")
        .joins(subquery_join)
        .where(conditions)
        .limit(nil)
        .offset(nil)
    end
  end
end

jsuchal avatar Dec 21 '23 14:12 jsuchal

@nertzy Anything else I can do to help get this reviewed and merged?

m5rk avatar Apr 13 '24 10:04 m5rk