pg_search
pg_search copied to clipboard
Searching for email addresses
I'm trying to search for users based on either name or email address. As far as I can tell postgresql interprets the '@' symbol as a special character and breaks my search.
Here is my setup on the search:
pg_search_scope :search_by_name_and_email,
against: { name: "A", email: "B" },
using: { tsearch: { prefix: true }}
When I pass through a query of 'user@ex' I get the following sql:
SELECT "users".*, ((ts_rank((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*')), 0))) AS pg_search_rank FROM "users" WHERE (((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*')))) ORDER BY pg_search_rank DESC, "users"."id" ASC
It seems like postgresql is breaking up my query at the '@' token, based on the following:
select * from ts_parse('default', 'user@ex');
tokid | token
-------+-------
1 | user
12 | @
1 | ex
(3 rows)
select * from ts_debug('simple', 'user@ex');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+--------------+------------+---------
asciiword | Word, all ASCII | user | {simple} | simple | {user}
blank | Space symbols | @ | {} | |
asciiword | Word, all ASCII | ex | {simple} | simple | {ex}
(3 rows)
Should I be escaping my query in some way, or does pg_search simply not support searching by fields which contain an '@' symbol? (I'm guessing we would have the same problem with other 'special' characters in posgresql, such as '&', '|' or '_'.
If PostgreSQL is treating @ as a space, then I think it should be added to our DISALLOWED_CHARACTERS array. Shouldn't be too difficult to add a few test cases and then update the implementation. I'd want to make sure queries for longer, more complete email addresses still work, because I believe the tsearch parser recognizes full email addresses.
If you or anyone can make a pull request, it would be appreciated as well.
I might be able to get to this soon. Otherwise
Grant Hutchins
On Feb 18, 2015, at 2:30 PM, Jaco Pretorius [email protected] wrote:
I'm trying to search for users based on either name or email address. As far as I can tell postgresql interprets the '@' symbol as a special character and breaks my search.
Here is my setup on the search:
pg_search_scope :search_by_name_and_email, against: { name: "A", email: "B" }, using: { tsearch: { prefix: true }} When I pass through a query of 'user@ex' I get the following sql:
SELECT "users"., ((ts_rank((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':')), 0))) AS pg_search_rank FROM "users" WHERE (((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*')))) ORDER BY pg_search_rank DESC, "users"."id" ASC It seems like postgresql is breaking up my query at the '@' token, based on the following:
select * from ts_parse('default', 'user@ex'); tokid | token -------+------- 1 | user 12 | @ 1 | ex (3 rows) select * from ts_debug('simple', 'user@ex'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+--------------+------------+--------- asciiword | Word, all ASCII | user | {simple} | simple | {user} blank | Space symbols | @ | {} | | asciiword | Word, all ASCII | ex | {simple} | simple | {ex} (3 rows) Should I be escaping my query in some way, or does pg_search simply not support searching by fields which contain an '@' symbol? (I'm guessing we would have the same problem with other 'special' characters in posgresql, such as '&', '|' or '_'.
— Reply to this email directly or view it on GitHub.
Hey @nertzy thanks for the quick reply. I actually did dig around (before posting above) and tried adding the '@' to DISALLOWED_CHARACTERS, but it didn't make a difference :-/
We just ran into this on version 0.7.6. Hacked solution was to clean the query param in the controller:
filtered_query = params[:query] ? params[:query].split(/\W/) : nil
if filtered_query.present?
model.search...
Then we decided to upgrade to latest version 1.0.4, and the issue has apparently been resolved, hack is no longer needed. I think this issue can be closed.
@Jaco-Pretorius Does upgrading to newer PgSearch solve this for you too?
Is there a workaround for this? I am using Postgres 9.4.4 and pg_search-1.0.5
I'm using pg_search 1.0.6 and Postgres 9.4. Having the same issue. If I have [email protected] as the value in my email column, and my query is bob, I get an empty array. The only way I get a match is if my query is exactly [email protected] or is bob. If I try to search for anything after the @ sign, no results are returned. So for example, example.com or @example.com return nothing.
I'm still having issues with this as well using Postgres 9.4 and 1.06 including @ will not match any emails
In case anyone comes here with the same problem @danielbonnell has in the comment above
The answer is that trigram search just simply does not work like that.
Here is a good explanation on why it won't work.
You're just going to have to use good-old LIKE
@vinhboy I fortunately realized I didn't need trigram for my use case. So I changed my setup and now it works as expected.
class Contact < ActiveRecord::Base
pg_search_scope :search_for, against: [:first_name, :last_name, :email],
using: {
tsearch: { prefix: true, any_word: true }
},
ranked_by: '(contacts.engagement)',
order_within_rank: 'contacts.last_sign_in_at DESC'
end
Here is one potential solution: https://stackoverflow.com/a/35669178
To solve this, I used a cached column to store tsvector on creation time where I stored results of to_tsvector('simple', regexp_split_to_array(email, E'\[.@]')::text). Then I used the above using query to match.
@sp2410 if I try this method I get an error as soon as I try to create the column, how did you bypass that?
ALTER TABLE users
ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('simple', regexp_split_to_array(coalesce(email,''), E'\[.@]')::text), 'A') ||
setweight(to_tsvector('simple', coalesce(last_name,'')), 'B') ||
setweight(to_tsvector('simple', coalesce(first_name,'')), 'C')
) STORED;
Error:
PG::InvalidObjectDefinition: ERROR: generation expression is not immutable
As soon as I remove the regex_split_to_array call, the migration resumes working.
PR #509 would help with this by just using a basic ilike search. Let me know if this would work for you.