wger
wger copied to clipboard
Use postgres' full text search
Use case
Get better results for exercises and specially ingredients with full text search
Proposal
If using postgres, we should use its full text search capabilities so that we get better results and smooth out typos (search in exercises.api.views and nutrition.api.views). A short check of the connection engine should make easy to use the current filter if that's not the case. We should also document how to add an index to the db.
https://docs.djangoproject.com/en/3.2/ref/contrib/postgres/search/
from django.db import connection
if 'postgres' in connection.settings_dict['ENGINE']:
...
else:
...
I recently worked on integrating full text search using postgres. I could take a look but I had a question. Is the project open to using database of multiple types? If it's going to be hosted using Postgres only then it would be better to generate a migration where we generate search vectors on fields (using SearchVectorField) that will remain indexed and the searches will operate against those fields. If not we can just use the default searchvector strategy to filter the text where we have to annotate to leverage the postgres full text search.
While the docker images and the docs use postgres, I wouldn't want to make it the only option (and e.g. during development I only use sqlite as it is very easy to work with). Does django support writing backend specific migrations? That would be probably the easiest
Not sure about backend specific migrations but django does have post migration signals as far as I know. So based on this signal we can check the db type and run our own custom migrations. better yet add our own migration files which run some aspect of the code if it has a specific db backend type.
I've looked around a bit and found... little, but I think it's possible to detect, e.g. this answer on stackoverflow or we could probably do something like this and filter out non-postgres dbs
At the end of the day however it's all down to having your custom operations that you either write a class for or have it invoked as a series for forward, backward functions with conditions checking for db type. Also if we do write a migrations for this we would also need to define a column where would store the tsvector data types (which need to be indexed as well for better performance) of the column against which we are going to query with our search parameters. If the app isn't primarily running a posgres engine it will be tricky to add model fields dynamically based on db engine types. I might not have all the answers regarding this but this what I know thus far.
https://pganalyze.com/blog/full-text-search-django-postgres
Some prototyping can be found in the branch feature/pg-fulltext-search
Is anyone working on this issue? If not, I want to work on it.
Sure @uzza1hossain , go ahead
@rolandgeider, you tried to implement using SearchVectorField, should I also use this approach, or SearchVector enough? Also, which column should I use as a search_column? I can't find any column without 'name' relevant to the Ingredient model (table).
For the Exercise model (table) 'name' and 'description' column is appropriate for the search_column.
SearchVector might be enough, I was following the tutorial from pganalyze.com. For the ingredients we will just index the name, but should keep this flexible since we might add categories, brands or similar in the future. For exercises I'd say name, alias and description
Hey, whats the status? if its open i can give it a try
Hi @vbsh123 I don't think much has happened since then. If you want to try, go ahead, this would be really useful!
Hi, I would love to work on this issue. This would be my first as a contributor dev. I've had some experience working with PostgreSQL on the back-end of web apps. I look forward to giving it a shot, if that's alright.
@eriklolson it's yours
@eriklolson Hi! Any news? :)
@rolandgeider Hi there! As a relative newbie to Django I've been doing some tutorials to familiarize myself with the ins and outs of the framework. I focused on those that use Postgres database. I've also read the Postgres full text search documentation and that for Django. I'm now nearly finished with a full text search tutorial. After this, I will be fully prepared to tackle this issue in a more effective manner.
I should also mention I was able to rebase the master thread onto the Postgres branch, after resolving some conflicts. I did this because the Postgres branch was very behind the master.
awesome! If I can assist you in any way, just ping me
I also wanted to mention that a rebase would be needed, but you already did that :)
There is a PR that simply uses a trigram search, which seems to return better results for our use case such as simple typos, different spellings (pullup / pull-up / pull up).
For posterity, there is this mostly working code for the "real thing" will full text search in this commit: https://github.com/wger-project/wger/pull/1594/commits/8b0e3870a37754ea2229853a83c64d1a67dab1ab