pg_search icon indicating copy to clipboard operation
pg_search copied to clipboard

Searching for email addresses

Open Jaco-Pretorius opened this issue 10 years ago • 13 comments

I'm trying to search for users based on either name or email address. As far as I can tell postgresql interprets the '@' symbol as a special character and breaks my search.

Here is my setup on the search:

  pg_search_scope :search_by_name_and_email,
    against: { name: "A", email: "B" },
    using: { tsearch: { prefix: true }}

When I pass through a query of 'user@ex' I get the following sql:

SELECT "users".*, ((ts_rank((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*')), 0))) AS pg_search_rank FROM "users"  WHERE (((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*'))))  ORDER BY pg_search_rank DESC, "users"."id" ASC

It seems like postgresql is breaking up my query at the '@' token, based on the following:

select * from ts_parse('default', 'user@ex');
 tokid | token 
-------+-------
     1 | user
    12 | @
     1 | ex
(3 rows)
select * from ts_debug('simple', 'user@ex');
   alias   |   description   | token | dictionaries | dictionary | lexemes 
-----------+-----------------+-------+--------------+------------+---------
 asciiword | Word, all ASCII | user  | {simple}     | simple     | {user}
 blank     | Space symbols   | @     | {}           |            | 
 asciiword | Word, all ASCII | ex    | {simple}     | simple     | {ex}
(3 rows)

Should I be escaping my query in some way, or does pg_search simply not support searching by fields which contain an '@' symbol? (I'm guessing we would have the same problem with other 'special' characters in posgresql, such as '&', '|' or '_'.

Jaco-Pretorius avatar Feb 18 '15 20:02 Jaco-Pretorius

If PostgreSQL is treating @ as a space, then I think it should be added to our DISALLOWED_CHARACTERS array. Shouldn't be too difficult to add a few test cases and then update the implementation. I'd want to make sure queries for longer, more complete email addresses still work, because I believe the tsearch parser recognizes full email addresses.

If you or anyone can make a pull request, it would be appreciated as well.

I might be able to get to this soon. Otherwise

Grant Hutchins

On Feb 18, 2015, at 2:30 PM, Jaco Pretorius [email protected] wrote:

I'm trying to search for users based on either name or email address. As far as I can tell postgresql interprets the '@' symbol as a special character and breaks my search.

Here is my setup on the search:

pg_search_scope :search_by_name_and_email, against: { name: "A", email: "B" }, using: { tsearch: { prefix: true }} When I pass through a query of 'user@ex' I get the following sql:

SELECT "users"., ((ts_rank((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':')), 0))) AS pg_search_rank FROM "users" WHERE (((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*')))) ORDER BY pg_search_rank DESC, "users"."id" ASC It seems like postgresql is breaking up my query at the '@' token, based on the following:

select * from ts_parse('default', 'user@ex'); tokid | token -------+------- 1 | user 12 | @ 1 | ex (3 rows) select * from ts_debug('simple', 'user@ex'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+--------------+------------+--------- asciiword | Word, all ASCII | user | {simple} | simple | {user} blank | Space symbols | @ | {} | | asciiword | Word, all ASCII | ex | {simple} | simple | {ex} (3 rows) Should I be escaping my query in some way, or does pg_search simply not support searching by fields which contain an '@' symbol? (I'm guessing we would have the same problem with other 'special' characters in posgresql, such as '&', '|' or '_'.

— Reply to this email directly or view it on GitHub.

nertzy avatar Feb 18 '15 23:02 nertzy

Hey @nertzy thanks for the quick reply. I actually did dig around (before posting above) and tried adding the '@' to DISALLOWED_CHARACTERS, but it didn't make a difference :-/

Jaco-Pretorius avatar Feb 18 '15 23:02 Jaco-Pretorius

We just ran into this on version 0.7.6. Hacked solution was to clean the query param in the controller:

    filtered_query = params[:query] ? params[:query].split(/\W/) : nil
    if filtered_query.present?
     model.search...

Then we decided to upgrade to latest version 1.0.4, and the issue has apparently been resolved, hack is no longer needed. I think this issue can be closed.

pboling avatar Aug 06 '15 19:08 pboling

@Jaco-Pretorius Does upgrading to newer PgSearch solve this for you too?

amarshall avatar Sep 09 '15 05:09 amarshall

Is there a workaround for this? I am using Postgres 9.4.4 and pg_search-1.0.5

real-ashwin avatar Feb 11 '16 07:02 real-ashwin

I'm using pg_search 1.0.6 and Postgres 9.4. Having the same issue. If I have [email protected] as the value in my email column, and my query is bob, I get an empty array. The only way I get a match is if my query is exactly [email protected] or is bob. If I try to search for anything after the @ sign, no results are returned. So for example, example.com or @example.com return nothing.

danielbonnell avatar Oct 18 '16 16:10 danielbonnell

I'm still having issues with this as well using Postgres 9.4 and 1.06 including @ will not match any emails

elithecho avatar Oct 19 '16 03:10 elithecho

In case anyone comes here with the same problem @danielbonnell has in the comment above

The answer is that trigram search just simply does not work like that.

Here is a good explanation on why it won't work.

You're just going to have to use good-old LIKE

vinhboy avatar Jan 19 '17 07:01 vinhboy

@vinhboy I fortunately realized I didn't need trigram for my use case. So I changed my setup and now it works as expected.

class Contact < ActiveRecord::Base
    pg_search_scope :search_for, against: [:first_name, :last_name, :email],
        using: {
            tsearch: { prefix: true, any_word: true }
        },
        ranked_by: '(contacts.engagement)',
        order_within_rank: 'contacts.last_sign_in_at DESC'
end

danielbonnell avatar Jan 19 '17 23:01 danielbonnell

Here is one potential solution: https://stackoverflow.com/a/35669178

sbatyuk avatar Aug 13 '19 22:08 sbatyuk

To solve this, I used a cached column to store tsvector on creation time where I stored results of to_tsvector('simple', regexp_split_to_array(email, E'\[.@]')::text). Then I used the above using query to match.

sp2410 avatar Sep 06 '22 23:09 sp2410

@sp2410 if I try this method I get an error as soon as I try to create the column, how did you bypass that?

ALTER TABLE users
        ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
          setweight(to_tsvector('simple', regexp_split_to_array(coalesce(email,''), E'\[.@]')::text), 'A') ||
          setweight(to_tsvector('simple', coalesce(last_name,'')), 'B') ||
          setweight(to_tsvector('simple', coalesce(first_name,'')), 'C')
        ) STORED;

Error:

PG::InvalidObjectDefinition: ERROR:  generation expression is not immutable

As soon as I remove the regex_split_to_array call, the migration resumes working.

cabello avatar Dec 09 '22 10:12 cabello

PR #509 would help with this by just using a basic ilike search. Let me know if this would work for you.

jbennett avatar Mar 24 '23 18:03 jbennett