pg_search
pg_search copied to clipboard
:associated_against queries don't use indexes
Hi fellows,
I've created a couple of gin indexes to speed up my pg_search scopes. All works fine and the generated queries use the indexes if I do simple against (Ex. :against => :load_number,).
But as soon as I add :associated_against indexes don't get used anymore (or at least this is what postgres explain shows)
I do have indexes on the table/column that show in :associated_against
Any ideas what is going on here?
Regards,
I've opened a stackoverflow question with this one also, asking the pg community
http://stackoverflow.com/questions/11941253/how-i-can-index-this-query-for-free-text-search
Yeah, unfortunately I don't know of a way to get :associated_against queries to work against an index, at least the way things are currently implemented.
It's because :associated_against searches against all of the associated records' text joined together, rather than on a single record at a time basis.
So for instance if you joined into a tags table, and there are 3 records ("foo", "bar", and "baz"), then you would expect a search for "foo baz" to find it. The more easily indexable solution would only work for "foo" or "baz" queries, but not "foo baz", because no one of individual tags rows matches both terms, if you know what I mean.
It's not possible to index across multiple records (at least to my knowledge).
Maybe we could surface an option to do the search record-by-record, which could use the index, but not match across records.
Maybe for associations which will only map to one record (meaning a has_one or belongs_to association from the model being searched to the other record) I can turn off the code that aggregates all the text for multiple records. I am intrigued by this possibility, because it would dramatically improve the usefulness of :associated_against.
Also, another solution would be to use multisearch to index addresses. You could set things up so that the searchable text includes the name of the state. One caveat is that if you ever edit a State record, you will need to tell its Address records to update their pg_search_document by calling Address#update_pg_search_document
Let me know what you figure out. I know that PostgreSQL full-text search has many moving parts and can get confusing at times. I'm always open to hearing new ways of organizing all of it.
For now all I could come up with is doing a select from a virtual table
SELECT consolidated_address.id, (ts_rank((to_tsvector('simple', coalesce(consolidated_address.name::text, '')) || to_tsvector('simple', coalesce(consolidated_address.state_name::text, ''))), (to_tsquery('simple', ''' ' || 'Gallaway' || ' ''' || ':*')), 0)) AS pg_search_rank
FROM (
SELECT "addresses".id,
"addresses".name,
string_agg("states".name::text, ' ') as state_name
FROM addresses
LEFT OUTER JOIN "states"
ON "states".id = "addresses".state_id
GROUP BY "addresses".id) consolidated_address
WHERE
(((to_tsvector('simple', coalesce(consolidated_address.name::text, '')) || to_tsvector('simple', coalesce(consolidated_address.state_name::text, ''))) @@ (to_tsquery('simple', ''' ' || 'Gallaway' || ' ''' || ':*'))))
It's a little faster but still does no use any of the indexes
This might be crazy, but is there some way to use a view to join the text and look up the records through that view? (Can a view be indexed?)
No, you can only create an index on the underlying tables. A view is a virtual realtion, not a physical one, so what would the index point to?
I have an idea.
Right now we are using the built-in function string_agg to aggregate the content of the multiple rows into one string, then parse that into a tsvector. But doing some searching, I've found an agg_tsvector function that should work with tsvectors:
CREATE AGGREGATE public.agg_tsvector (
sfunc = public.tsvector_concat,
basetype = tsvector,
stype = tsvector
);
So if we change :associated_against to use this instead, we can pre-calculate tsvectors and index them, and hopefully this index will end up being usable.
That sound very promising, do you want me to give it a shoot and send a pull request? On Sep 10, 2012 1:45 AM, "Grant Hutchins" [email protected] wrote:
I have an idea.
Right now we are using the built-in function string_agg to aggregate the content of the multiple rows into one string, then parsing that into a tsvector. But doing some searching, I've found an agg_tsvector functionhttp://bzr.coffeecode.net/ILS/rel_1_2/Open-ILS/src/sql/Pg/020.schema.functions.sqlthat should work with tsvectors:
CREATE AGGREGATE public.agg_tsvector ( sfunc = public.tsvector_concat, basetype = tsvector, stype = tsvector);
So if we change :associated_against to use this instead, we can pre-calculate tsvectors and index them, and hopefully this index will end up being usable.
— Reply to this email directly or view it on GitHubhttps://github.com/Casecommons/pg_search/issues/51#issuecomment-8410296.
Sure, pull requests are always welcomed!
Did anyone make any progress on this? Using an index for an associated_against query?
up
I found out on this good article: http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
a solution where we need to create a materialized view, then index this view, then make queries against that view. But I think this won't work with pg_search, since it will query against multiple tables instead of a view. Amirite?
My question is: does make sense pg_search on come supporting something like this?
against_view: :my_cool_view
The way I understand it, a view can be accessed in queries exactly like a table. So if you were to make an ActiveRecord::Base subclass wrapping the view, you could put a pg_search_scope directly on that class.
I don't think something like against_view would work because it's uncertain what the correct return value would be. A view could have arbitrary schema, and thus it won't necessarily match the table backing your existing ActiveRecord::Base subclass.
Grant Hutchins
On Oct 30, 2014, at 9:09 AM, Leandro Maríngolo [email protected] wrote:
up
I found out on this goof article: http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
a solution where we need to create a materialized view, then index this view, then make queries against that view. But I think this won't work with pg_search, since it will query against multiple tables instead of a view. Amirite?
My question is: does make sense pg_search come supporting something like this?
against_view: :my_cool_view — Reply to this email directly or view it on GitHub.
Nice, I'll move forward on this way, by mapping a class to a view.
Thanks
Hi. So, are there any ways to make associated_against searches faster now? This problem is actual, because very popular 'globalize' gem creates associated tables to keep translations, so the pg_search must perfrom it`s searching in associated tables. It is sad if there no ways to make search procedure work as it should.
@nertzy I've been trying to see if there is a real alternative solution for this - if we allow an option for performing the search record-by-record as you mentioned. This basically means if you joined to a tags table with 'foo', 'bar' and 'baz' we would match a search of 'foo' but not 'foo baz'.
My idea is to do the matching within the association join nodes, which means we could use indexes - but it substantially changes the structure of the query and changes how the ranking works. It's not really clear to me what the expectation would be around the ranking in this case.
To illustrate what I'm going on about, here is the regular generated sql and the modified sql.
Before I go any further I just want to get some feedback on this idea. It would definitely make it much better for our purposes (in fact we can't actually use associated_against at all at the moment - because of this issue), but would like to hear from others.
I'm all for a solution that is indexable and that returns roughly the same set of records. I'm personally not too concerned about the relative ranking of results, as I feel like the current solution's ranking is fairly arbitrary.
There has also been some discussion about extracting associated_against into a separate gem to reduce complexity. Might be something worth exploring as well.