pg_search
pg_search copied to clipboard
ranked_by greatest similarity by column?
I am searching over 4 columns, with an exact match in one column, using trigram.
From looking at the sql generated, the rank ( used in the sql ORDER ) is being generated from a concat of all the columns, so:
(similarity((coalesce("products"."short_name"::text, '') || ' ' || coalesce("products"."name"::text, '') || ' ' || coalesce("products"."gmi_code"::text, '') || ' ' || coalesce("products"."currency "::text, '')), 'EMD')) AS rank
....
ORDER BY pg_search.rank DESC, "products"."id" ASC
What this means for my use case, is that the other columns are polluting the results. If I search for foo
and I have a foo
and a bar foo baz
it seems I don't get the search results in an order i'd expect.
In my use case, I'd like to find results for the best matching column:
(similarity(coalesce("products"."short_name"::text, ''), 'EMD')) AS srank,
(similarity(coalesce("products"."name"::text, ''), 'EMD')) AS nrank,
(similarity((coalesce("products"."short_name"::text, '') || ' ' || coalesce("products"."name"::text, '') || ' ' || coalesce("products"."gmi_code"::text, '') || ' ' || coalesce("products"."currency "::text, '')), 'EMD')) AS rank
...
ORDER BY greatest(pg_search.srank, pg_search.nrank, pg_search.rank) DESC, "products"."id" ASC
This returns exact matches, and fuzzy matches by using the greatest rank
The following was a search for EMD
, which over the entire document is not a good match, but has very good similarity over a particular column.
name | short_name | rank | srank | nrank
-------------------------+------------+-----------+----------+-------
M2M OPk | EMDOXYZ | 0.09375 | 0.333333 | 0
M2M PFC OPk | EMDUXYZ | 0.0857143 | 0.333333 | 0
M2M PFC Pk | EMDTXYZ | 0.0909091 | 0.333333 | 0
M2M Pk | EMDMXYZ | 0.1 | 0.333333 | 0
I'm not certain if this is something I am doing wrong in my scope, or I can define a custom rank_by that can rank multiple items and choose a best fit.
I would be interested in adding this feature if it's not supported, but I don't know if my use case is really generally applicable and a good fit for this library.
I considered an API like
pg_search_scope :search_full_text,
against: {
short_name: "A",
name: "D",
gmi_code: "D",
currency: "D"
},
using: {
tsearch: { prefix: true }
},
ranked_by: :similarity
but then i wanted something more flexible like
ranked_by: { greatest: [:short_name, :name]}
greatest, could also be other functions, like least
I've considered just monkey patching the rank methods on :trigram, but If this feels like a good fit, I'd rather try to whip up a pr.
Very old request, but I need this as well :) @kluzny Have you found a solution for that?