ransack icon indicating copy to clipboard operation
ransack copied to clipboard

Full-text search for Postgres

Open thanhtoan1196 opened this issue 4 years ago • 20 comments

Please support "Full-text search for MySQL and Postgres"

thanhtoan1196 avatar Jul 14 '20 14:07 thanhtoan1196

Looks like a pretty big feature, could you add a description of how it would work?

scarroll32 avatar Nov 27 '20 19:11 scarroll32

@seanfcarroll there is a great episode on it here: http://railscasts.com/episodes/343-full-text-search-in-postgresql. We can start with postgres only.

I've also been trying to get all of this to work and be more like Elastic Search/Algolia while keeping things simple for my team at work.

I'd love to contribute back to ransack and add some of these features.

This is the documentation for it in PG: https://www.postgresql.org/docs/current/textsearch.html -- it is pretty terse and hard to grep I find, but through.

There is also a fuzzy search feature part of this: https://www.postgresql.org/docs/13/fuzzystrmatch.html

yagudaev avatar Dec 12 '20 00:12 yagudaev

Out of curiosity, I looked at mysql and it seems their full-text search is even simpler than pg. https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html.

Also wanted to note that Django actually has full-text with pg built right in https://docs.djangoproject.com/en/3.1/topics/db/search/

yagudaev avatar Dec 28 '20 20:12 yagudaev

An interesting blog post on full text search with Postgres and Rails https://pganalyze.com/blog/full-text-search-ruby-rails-postgres

scarroll32 avatar Feb 24 '21 19:02 scarroll32

@seanfcarroll what do you think about starting off simpler, we can provide a typo tolerant predict like so:

*_fuzzy_matches - uses Levenstein distance to compare strings if the distance is smaller than 3 it matches

An example would be something like this:

# | last_name |
# -------------
#  smith

User.ransacker(last_name_fuzzy_matches: 'smth') # matches since the levensthin distance is 1
User.ransacker(last_name_fuzzy_matches: 'smithsons') # no match, distance is 4

Postgres Reference

This is very similar to LIKE in performance, in that it will require a table scan. It is possible to make it really fast by using an inverted index. But that is a bigger topic that I think we can avoid for the time being.

For MySQL, I wasn't able to find a native implementation of Levenshtein, but I found this: https://gist.github.com/Kovah/df90d336478a47d869b9683766cff718. Thus we could include it in a database migration to make this functionality work for MySQL too.

It is possible to configure how distance is calculated and tolerance, I'm not sure what would be a good configuration. There are also Soundex and Doublemetahpone in Postgres which convert a string into a hash representation and can be used for quicker matching.

yagudaev avatar Mar 02 '21 04:03 yagudaev

@yagudaev I think the fuzzy match could be a great start, and to make it even smaller it could start with Postgres first and then add MySQL later.

A couple of interesting posts on the topic:

  • https://www.rdegges.com/2013/easy-fuzzy-text-searching-with-postgresql/
  • https://severalnines.com/database-blog/full-text-searches-mysql-good-bad-and-ugly

scarroll32 avatar Mar 02 '21 21:03 scarroll32

The https://github.com/textacular/textacular library might be a great way to handle full text search under Postgres. WDYT @gregmolnar ?

scarroll32 avatar Mar 31 '21 06:03 scarroll32

Splitting out the MySQL component of this into a new issue, as there will be a different solution.

https://github.com/activerecord-hackery/ransack/issues/1219

scarroll32 avatar Mar 31 '21 06:03 scarroll32

Thanks @seanfcarroll I've started playing around with adding this as a new predicate to Ransack. I'll try to put a draft up at least in the next week or so to get us started.

Some of the search algorithms require persisting a reverse index to get reasonable performance. Postgres (and I'm sure other DBs) allow building this index on the fly, which means part of the query time is spent building an index. In practical terms, I don't know where it starts to go crazy 1K? 10K? 100K? 1M? I'll try it out and see.

Are there any performance benchmarks for ransack to keep in mind?

yagudaev avatar Mar 31 '21 17:03 yagudaev

The https://github.com/textacular/textacular library might be a great way to handle full text search under Postgres. WDYT @gregmolnar ?

I agree.

gregmolnar avatar Mar 31 '21 17:03 gregmolnar

Maybe I'm missing something but I always thought that gem was no longer maintained. There is however pg_search: https://github.com/Casecommons/pg_search -- the name is confusing it should be pg_fts as in access to full-text search features in postgres.

Do you like the API better for textacular?

yagudaev avatar Mar 31 '21 17:03 yagudaev

Maybe I'm missing something but I always thought that gem was no longer maintained.

What makes you think that?

gregmolnar avatar Mar 31 '21 17:03 gregmolnar

Quick Start Rails 3, Rails 4 In the project's Gemfile add

Requirements Ruby 1.9.2

Copyrights 2011

All these amounts to red flags that say "it hasn't been cared for in a long time". Compare this to the pg_search gem documentation which also started in 2011, but kept getting updated. It is also better laid out so it feels more "trustworthy" to me. I realize those are subjective, thus I may have misread the optics of it.

yagudaev avatar Mar 31 '21 19:03 yagudaev

So if a gem supports old versions of Rails and Ruby that's a red flag? I am the maintainer of textacular by the way and even though not too often, but it gets updated. Considering it is a pretty simple tool, I don't see the point of updating it more often just for the sake of updates.

It is also better laid out so it feels more "trustworthy" to me.

This is indeed subjective. Textacular was written by tenderlove by the way and I wouldn't have trust issues with anything he touches :)

gregmolnar avatar Mar 31 '21 19:03 gregmolnar

So if a gem supports old versions of Rails and Ruby that's a red flag?

More so that it doesn't list the newest versions of Rails and the newest versions of ruby. So it was unclear if it still works for Rails 6.1+, Ruby 2.7+.

I'll help update the documentation. Open source is hard, so thank you for the contribution :). Having options is fantastic too.

Any objective criteria we should consider when picking between pg_search and textacular to integrate FTS with ransack? I was going to defer the choice completely and focus on very basic SQL syntax usage to keep it simple.

yagudaev avatar Apr 01 '21 01:04 yagudaev

More so that it doesn't list the newest versions of Rails and the newest versions of ruby. So it was unclear if it still works for Rails 6.1+, Ruby 2.7+.

Rails > 5.0 includes 6.1+

gregmolnar avatar Apr 01 '21 10:04 gregmolnar

Any objective criteria we should consider when picking between pg_search and textacular to integrate FTS with ransack? I was going to defer the choice completely and focus on very basic SQL syntax usage to keep it simple.

It's a tricky one, both libraries look good. To look at advantages:

pg_search

  • Has extensive documentation
  • Explicitly added to models via include PgSearch::Model rather than adding to every model 🎉
  • Rake task to rebuild indexes
  • 670 stars

Thinking about how this dependency would be used outside of Ransack in a project, I quite like the way the model is only extended with the include.

textacular

  • Maintained by @gregmolnar who is also a Ransack maintainer 🎉
  • 814 stars

@yagudaev if you are planning on doing a PR for this, then the final decision would rest with you - whatever you prefer.

One final consideration might be to compare the bundle size and memory usage of the two gems. I run my project on Heroku, and so like to keep memory down as much as possible. This means purging any unneeded gems and going for smaller ones where possible.

scarroll32 avatar Jun 11 '21 08:06 scarroll32

@seanfcarroll thanks for the info :).

I've got my hands full with two (soon to be three) React projects. Might be a while until I get a chance to do more ruby and need this search capability. Once I do, I'll post an update here 😁

yagudaev avatar Jun 11 '21 19:06 yagudaev

Just a note to future searchers that it's easy to use ransack in combination with pg_search (or other gems that add custom scopes to your model) -- just add the scope you defined with pg_search to the ransackable_scopes class method on your model. More explicitly laid out in this blog post (thanks to the author of that!).

This doesn't perhaps solve the original feature request, since the functionality isn't provided by ransack directly. It's possible that it'd make sense to add some bare-bones FTS functionality to ransack, but some of the more advanced features exposed by pg_search and others wouldn't fit particularly well into ransack, IMO.

mattdb avatar Oct 17 '22 17:10 mattdb

It's possible that it'd make sense to add some bare-bones FTS functionality to ransack, but some of the more advanced features exposed by pg_search and others wouldn't fit particularly well into ransack, IMO.

I agree that it would make sense to have FTS as an add-on: either through a third party gem or a new Ransack extension gem, which could be specific by database. Ransack is already big enough and it would be good to have some features as add ons.

scarroll32 avatar Nov 25 '23 14:11 scarroll32