django-watson
django-watson copied to clipboard
Getting extremely broad search results when searching on username field
I am running into a weird issue where when searching on a username (like [email protected]), for certain users I get extremely broad results...users that definitely do not have that phrase in their title, description, or content fields. In one case I get 7000+ results in my queryset, even though the email in question definitely only associated with one entry in my index.
To make things more confusing, some searches return as expected. If I do "[email protected]" for instance, I get exactly one results, as would be expected since username is a unique field.
Here is my app config:
class UsersAppConfig(AppConfig):
"""
Automatically import standalone signals file once app is ready.
Get around a circular import error otherwise facing.
"""
name = 'users'
def ready(self):
import signals
from django.contrib.auth.models import User
watson.register(
User, CaseInsensitiveSearchAdapter, fields=(
'first_name',
'last_name',
'username'
)
)
And the custom adapter I created based on some code you posted:
class CaseInsensitiveSearchAdapter(watson.SearchAdapter):
def get_title(self, obj):
return super(
CaseInsensitiveSearchAdapter, self
).get_title(obj).lower()
def get_description(self, obj):
return super(
CaseInsensitiveSearchAdapter, self
).get_description(obj).lower()
def get_content(self, obj):
return super(
CaseInsensitiveSearchAdapter, self
).get_content(obj).lower()
I am using MySQL as my database. When I manually inspect the data in the index, I don't see any duplication of data. And if I do a normal contains query for "[email protected]" I only get one result.
Sorry this is not the best issue as I don't know how to provide a reduced case here. Maybe there is a forehead thunker here that sticks out though?
Thanks so much for your work on this project, it's really awesome. I'm in the process of ripping out haystack + solr with this, and if I can just get this weird case figured out it will greatly reduce the moving pieces in my system.
One idea I had was, could this be some weird interaction between the @ symbol and the query used in the MySQL backend? Just a WAG, but thought I'd throw it out there.
Okay I think I'm on the right track with my @ symbol theory. If I change:
backends.py
RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]', re.UNICODE)
to (add an @)
RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]@', re.UNICODE)
And then enclose my actual search query text in " " I get the result I am expecting, exactly one result for "[email protected]".
According to the MySQL docs this an exact phrase match I believe, relevant SO answer: https://stackoverflow.com/questions/8961148/mysql-match-against-when-searching-e-mail-addresses
I'm in a situation where I want flexibility, users can search on name or email, so in the case of email i want to do an exact match, however I want more broad results when searching on name.
I still don't get why just some particular usernames (emails) are triggering these very broad search results, where was others are not. But I can live with that if I can just work around the issue.
So I think I just need to do some pre-processing on my search text and if I detect something email like in it, auto-enclose it in quotes (my users will not have the savvy to do this themselves).
Can I have a pull request to exclude that character? Sounds like a worthy bug fix.
On 25 April 2018 at 21:05, Ian Fitzpatrick [email protected] wrote:
Okay I think I'm on the right track with my @ symbol theory. If I change:
backends.py RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]', re.UNICODE)
to (add an @) RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]@', re.UNICODE)
And then enclose my actual search query text in " " I get the result I am expecting, exactly one result for "[email protected]".
According to the MySQL docs this an exact phrase match I believe, relevant SO answer: https://stackoverflow.com/questions/8961148/mysql-match- against-when-searching-e-mail-addresses
I'm in a situation where I want flexibility, users can search on name or email, so in the case of email i want to do an exact match, however I want more broad results when searching on name.
I still don't get why just some particular usernames (emails) are triggering these very broad search results, where was others are not. But I can live with that if I can just work around the issue.
So I think I just need to do some pre-processing on my search text and if I detect something email like in it, auto-enclose it in quotes (my users will not have the savvy to do this themselves).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/243#issuecomment-384416668, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJFCEFQf3Qxt2sMzC4WI3U8mqUQNwtSks5tsNcCgaJpZM4Ti1Iz .
(Sorry I took so long to reply, I've been snowed under at work)
On 17 May 2018 at 17:32, Dave Hall [email protected] wrote:
Can I have a pull request to exclude that character? Sounds like a worthy bug fix.
On 25 April 2018 at 21:05, Ian Fitzpatrick [email protected] wrote:
Okay I think I'm on the right track with my @ symbol theory. If I change:
backends.py RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]', re.UNICODE)
to (add an @) RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]@', re.UNICODE)
And then enclose my actual search query text in " " I get the result I am expecting, exactly one result for "[email protected]".
According to the MySQL docs this an exact phrase match I believe, relevant SO answer: https://stackoverflow.com/ques tions/8961148/mysql-match-against-when-searching-e-mail-addresses
I'm in a situation where I want flexibility, users can search on name or email, so in the case of email i want to do an exact match, however I want more broad results when searching on name.
I still don't get why just some particular usernames (emails) are triggering these very broad search results, where was others are not. But I can live with that if I can just work around the issue.
So I think I just need to do some pre-processing on my search text and if I detect something email like in it, auto-enclose it in quotes (my users will not have the savvy to do this themselves).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/243#issuecomment-384416668, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJFCEFQf3Qxt2sMzC4WI3U8mqUQNwtSks5tsNcCgaJpZM4Ti1Iz .
Sure thing, I'll try and get something to you next week.