rubyvideo icon indicating copy to clipboard operation
rubyvideo copied to clipboard

Litesearch POC

Open adrienpoly opened this issue 2 years ago β€’ 13 comments

This is mostly a POC at this time to test the Litesearch capabilities.

It is very straight forward to integrate with Active record. The ability to set a weight. Being able to replace Meilisearch here would be nice to have something easier to install/deploy. Maybe we can keep the vector based recommendations from #19

Current limitation is that I cannot search on speakers name as the through associations are not yet supported https://github.com/oldmoe/litestack/issues/45

adrienpoly avatar Oct 18 '23 00:10 adrienpoly

Litesearch POC πŸ”—

Stats

Language Score Trend
Ruby 68.45 (from 68.83) πŸ“‰ -0.55%
JavaScript 84.95 (from 84.95) πŸ“‰ 0.0%

Trends

Most Improved Largest Declines
Ruby app/models/talk.rb app/controllers/talks_controller.rb
JavaScript No decreases for JavaScript detected No increases for JavaScript detected

To-Dos

New to Refactor Refactored
Ruby No new To-Dos for Ruby detected No completed To-Dos for Ruby detected
JavaScript No new To-Dos for JavaScript detected No completed To-Dos for JavaScript detected

useattractor[bot] avatar Oct 18 '23 00:10 useattractor[bot]

Regarding similarity search, instead of similarity matching, and since Litesearch sorts by rank by default, did you think of trying out extracting the most significant words from the current video title & description and then doing an OR search with them? the resulting set would be sorted by those closest to the search query. The trick here would be to manually get rid of what could be considered stop words (currently Litesearch has no facility for doing so)

oldmoe avatar Oct 26 '23 22:10 oldmoe

Regarding similarity search, instead of similarity matching, and since Litesearch sorts by rank by default, did you think of trying out extracting the most significant words from the current video title & description and then doing an OR search with them? the resulting set would be sorted by those closest to the search query. The trick here would be to manually get rid of what could be considered stop words (currently Litesearch has no facility for doing so)

yeah, I thought about that but going that route I feel, I ll be re inventing a search engine. This is where the combo Sqlite Meilisearch was interesting as Meilisearch brings all of this already. The pain point I have with Meilisearch is the upgrades are not really easy. I ll see if a simple Litesearch is good enough especially once I have some tags filters available

adrienpoly avatar Oct 28 '23 12:10 adrienpoly

I can try to hide much of the complexity and offer a model#similar method on AR objects, could be a nice abstraction.

oldmoe avatar Oct 28 '23 17:10 oldmoe

Litesearch now has a similar method on the index, and on any AR or Sequel model object you can do something like video.similar(limit) to get a list of similar videos ordered by similarity, The limit defaults to 10 entries if not supplied. This is not in the released gem yet, I would love to see if the (admittedly naΓ―ve) approach is useful on actual data

oldmoe avatar Nov 02 '23 14:11 oldmoe

@oldmoe great given the underlining job in https://github.com/adrienpoly/rubyvideo/pull/64. It will be easy to test it in real life. Will look at it soon hopefully

adrienpoly avatar Nov 02 '23 16:11 adrienpoly

Litesearch POC πŸ”—

Stats

Language Score Trend
Ruby 68.81 (from 69.16) πŸ“‰ -0.51%
JavaScript 84.95 (from 84.95) πŸ“‰ 0.0%

Trends

Most Improved Largest Declines
Ruby app/models/talk.rb app/controllers/talks_controller.rb
JavaScript No decreases for JavaScript detected No increases for JavaScript detected

To-Dos

New to Refactor Refactored
Ruby No new To-Dos for Ruby detected No completed To-Dos for Ruby detected
JavaScript No new To-Dos for JavaScript detected No completed To-Dos for JavaScript detected

useattractor[bot] avatar Nov 02 '23 20:11 useattractor[bot]

@oldmoe I tried to run it out of master branch but I am getting this error

gems/litestack-5d383d83c767/lib/litestack/litedb.rb:131:in `initialize': no such table: talks_search_idx_row (SQLite3::SQLException)

I tried to run in console

 Talk.rebuild_index!

but it returns the same error

if I rollback to the latest official release, litesearch works ok (but no similarity search)

adrienpoly avatar Nov 02 '23 20:11 adrienpoly

Ok I made some progress I had to go back to the previous version drop the index then switch back to master and rebuild the index

now I am getting this error CleanShot 2023-11-02 at 22 39 35@2x

adrienpoly avatar Nov 02 '23 21:11 adrienpoly

Thanks for trying it out, turns out this is due to the tokenizer being a trigram one, I am looking into how to avoid tokens that would cause syntax errors, could you please send me the data for the particular object you are testing?

oldmoe avatar Nov 04 '23 09:11 oldmoe

I have just pushed a change that would fix the issue, but I am not sure of the quality of the similarity search using the terms stored in the trigram tokenized index, a porter or unicode tokenizer will yield much better similarity results. I think I will need to reconsider how similarity is implemented for trigram indexes specifically

oldmoe avatar Nov 04 '23 10:11 oldmoe

The data can be found in /data https://github.com/adrienpoly/rubyvideo/tree/main/data

it is all the videos.yml file that are indexed by the Talk model

adrienpoly avatar Nov 04 '23 10:11 adrienpoly

If you run this branch a simple rails db:create db:seed and bin/dev should get you up and running

then you can update the related_talks method to use lite search similar

adrienpoly avatar Nov 04 '23 10:11 adrienpoly

closing for now

adrienpoly avatar Jun 11 '24 10:06 adrienpoly