ransack
ransack copied to clipboard
Full-text search for Postgres
Please support "Full-text search for MySQL and Postgres"
Looks like a pretty big feature, could you add a description of how it would work?
@seanfcarroll there is a great episode on it here: http://railscasts.com/episodes/343-full-text-search-in-postgresql. We can start with postgres only.
I've also been trying to get all of this to work and be more like Elastic Search/Algolia while keeping things simple for my team at work.
I'd love to contribute back to ransack and add some of these features.
This is the documentation for it in PG: https://www.postgresql.org/docs/current/textsearch.html -- it is pretty terse and hard to grep I find, but through.
There is also a fuzzy search feature part of this: https://www.postgresql.org/docs/13/fuzzystrmatch.html
Out of curiosity, I looked at mysql and it seems their full-text search is even simpler than pg. https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html.
Also wanted to note that Django actually has full-text with pg built right in https://docs.djangoproject.com/en/3.1/topics/db/search/
An interesting blog post on full text search with Postgres and Rails https://pganalyze.com/blog/full-text-search-ruby-rails-postgres
@seanfcarroll what do you think about starting off simpler, we can provide a typo tolerant predict like so:
*_fuzzy_matches - uses Levenstein distance to compare strings if the distance is smaller than 3 it matches
An example would be something like this:
# | last_name |
# -------------
# smith
User.ransacker(last_name_fuzzy_matches: 'smth') # matches since the levensthin distance is 1
User.ransacker(last_name_fuzzy_matches: 'smithsons') # no match, distance is 4
This is very similar to LIKE
in performance, in that it will require a table scan. It is possible to make it really fast by using an inverted index. But that is a bigger topic that I think we can avoid for the time being.
For MySQL, I wasn't able to find a native implementation of Levenshtein, but I found this: https://gist.github.com/Kovah/df90d336478a47d869b9683766cff718. Thus we could include it in a database migration to make this functionality work for MySQL too.
It is possible to configure how distance is calculated and tolerance, I'm not sure what would be a good configuration. There are also Soundex and Doublemetahpone in Postgres which convert a string into a hash representation and can be used for quicker matching.
@yagudaev I think the fuzzy match could be a great start, and to make it even smaller it could start with Postgres first and then add MySQL later.
A couple of interesting posts on the topic:
- https://www.rdegges.com/2013/easy-fuzzy-text-searching-with-postgresql/
- https://severalnines.com/database-blog/full-text-searches-mysql-good-bad-and-ugly
The https://github.com/textacular/textacular library might be a great way to handle full text search under Postgres. WDYT @gregmolnar ?
Splitting out the MySQL component of this into a new issue, as there will be a different solution.
https://github.com/activerecord-hackery/ransack/issues/1219
Thanks @seanfcarroll I've started playing around with adding this as a new predicate to Ransack. I'll try to put a draft up at least in the next week or so to get us started.
Some of the search algorithms require persisting a reverse index to get reasonable performance. Postgres (and I'm sure other DBs) allow building this index on the fly, which means part of the query time is spent building an index. In practical terms, I don't know where it starts to go crazy 1K? 10K? 100K? 1M? I'll try it out and see.
Are there any performance benchmarks for ransack to keep in mind?
The https://github.com/textacular/textacular library might be a great way to handle full text search under Postgres. WDYT @gregmolnar ?
I agree.
Maybe I'm missing something but I always thought that gem was no longer maintained. There is however pg_search: https://github.com/Casecommons/pg_search -- the name is confusing it should be pg_fts as in access to full-text search features in postgres.
Do you like the API better for textacular?
Maybe I'm missing something but I always thought that gem was no longer maintained.
What makes you think that?
Quick Start Rails 3, Rails 4 In the project's Gemfile add
Requirements Ruby 1.9.2
Copyrights 2011
All these amounts to red flags that say "it hasn't been cared for in a long time". Compare this to the pg_search gem documentation which also started in 2011, but kept getting updated. It is also better laid out so it feels more "trustworthy" to me. I realize those are subjective, thus I may have misread the optics of it.
So if a gem supports old versions of Rails and Ruby that's a red flag? I am the maintainer of textacular by the way and even though not too often, but it gets updated. Considering it is a pretty simple tool, I don't see the point of updating it more often just for the sake of updates.
It is also better laid out so it feels more "trustworthy" to me.
This is indeed subjective. Textacular was written by tenderlove by the way and I wouldn't have trust issues with anything he touches :)
So if a gem supports old versions of Rails and Ruby that's a red flag?
More so that it doesn't list the newest versions of Rails and the newest versions of ruby. So it was unclear if it still works for Rails 6.1+, Ruby 2.7+.
I'll help update the documentation. Open source is hard, so thank you for the contribution :). Having options is fantastic too.
Any objective criteria we should consider when picking between pg_search
and textacular
to integrate FTS with ransack? I was going to defer the choice completely and focus on very basic SQL syntax usage to keep it simple.
More so that it doesn't list the newest versions of Rails and the newest versions of ruby. So it was unclear if it still works for Rails 6.1+, Ruby 2.7+.
Rails > 5.0
includes 6.1+
Any objective criteria we should consider when picking between pg_search and textacular to integrate FTS with ransack? I was going to defer the choice completely and focus on very basic SQL syntax usage to keep it simple.
It's a tricky one, both libraries look good. To look at advantages:
pg_search
- Has extensive documentation
- Explicitly added to models via
include PgSearch::Model
rather than adding to every model 🎉 - Rake task to rebuild indexes
- 670 stars
Thinking about how this dependency would be used outside of Ransack in a project, I quite like the way the model is only extended with the include
.
textacular
- Maintained by @gregmolnar who is also a Ransack maintainer 🎉
- 814 stars
@yagudaev if you are planning on doing a PR for this, then the final decision would rest with you - whatever you prefer.
One final consideration might be to compare the bundle size and memory usage of the two gems. I run my project on Heroku, and so like to keep memory down as much as possible. This means purging any unneeded gems and going for smaller ones where possible.
@seanfcarroll thanks for the info :).
I've got my hands full with two (soon to be three) React projects. Might be a while until I get a chance to do more ruby and need this search capability. Once I do, I'll post an update here 😁
Just a note to future searchers that it's easy to use ransack in combination with pg_search
(or other gems that add custom scopes to your model) -- just add the scope you defined with pg_search to the ransackable_scopes
class method on your model. More explicitly laid out in this blog post (thanks to the author of that!).
This doesn't perhaps solve the original feature request, since the functionality isn't provided by ransack directly. It's possible that it'd make sense to add some bare-bones FTS functionality to ransack, but some of the more advanced features exposed by pg_search and others wouldn't fit particularly well into ransack, IMO.
It's possible that it'd make sense to add some bare-bones FTS functionality to ransack, but some of the more advanced features exposed by pg_search and others wouldn't fit particularly well into ransack, IMO.
I agree that it would make sense to have FTS as an add-on: either through a third party gem or a new Ransack extension gem, which could be specific by database. Ransack is already big enough and it would be good to have some features as add ons.