Litesearch POC
This is mostly a POC at this time to test the Litesearch capabilities.
It is very straight forward to integrate with Active record. The ability to set a weight. Being able to replace Meilisearch here would be nice to have something easier to install/deploy. Maybe we can keep the vector based recommendations from #19
Current limitation is that I cannot search on speakers name as the through associations are not yet supported https://github.com/oldmoe/litestack/issues/45
Litesearch POC π
Stats
| Language | Score | Trend |
|---|---|---|
| Ruby | 68.45 (from 68.83) | π -0.55% |
| JavaScript | 84.95 (from 84.95) | π 0.0% |
Trends
| Most Improved | Largest Declines | |
|---|---|---|
| Ruby | app/models/talk.rb | app/controllers/talks_controller.rb |
| JavaScript | No decreases for JavaScript detected | No increases for JavaScript detected |
To-Dos
| New to Refactor | Refactored | |
|---|---|---|
| Ruby | No new To-Dos for Ruby detected | No completed To-Dos for Ruby detected |
| JavaScript | No new To-Dos for JavaScript detected | No completed To-Dos for JavaScript detected |
Regarding similarity search, instead of similarity matching, and since Litesearch sorts by rank by default, did you think of trying out extracting the most significant words from the current video title & description and then doing an OR search with them? the resulting set would be sorted by those closest to the search query. The trick here would be to manually get rid of what could be considered stop words (currently Litesearch has no facility for doing so)
Regarding similarity search, instead of similarity matching, and since Litesearch sorts by rank by default, did you think of trying out extracting the most significant words from the current video title & description and then doing an OR search with them? the resulting set would be sorted by those closest to the search query. The trick here would be to manually get rid of what could be considered stop words (currently Litesearch has no facility for doing so)
yeah, I thought about that but going that route I feel, I ll be re inventing a search engine. This is where the combo Sqlite Meilisearch was interesting as Meilisearch brings all of this already. The pain point I have with Meilisearch is the upgrades are not really easy. I ll see if a simple Litesearch is good enough especially once I have some tags filters available
I can try to hide much of the complexity and offer a model#similar method on AR objects, could be a nice abstraction.
Litesearch now has a similar method on the index, and on any AR or Sequel model object you can do something like video.similar(limit) to get a list of similar videos ordered by similarity, The limit defaults to 10 entries if not supplied. This is not in the released gem yet, I would love to see if the (admittedly naΓ―ve) approach is useful on actual data
@oldmoe great given the underlining job in https://github.com/adrienpoly/rubyvideo/pull/64. It will be easy to test it in real life. Will look at it soon hopefully
Litesearch POC π
Stats
| Language | Score | Trend |
|---|---|---|
| Ruby | 68.81 (from 69.16) | π -0.51% |
| JavaScript | 84.95 (from 84.95) | π 0.0% |
Trends
| Most Improved | Largest Declines | |
|---|---|---|
| Ruby | app/models/talk.rb | app/controllers/talks_controller.rb |
| JavaScript | No decreases for JavaScript detected | No increases for JavaScript detected |
To-Dos
| New to Refactor | Refactored | |
|---|---|---|
| Ruby | No new To-Dos for Ruby detected | No completed To-Dos for Ruby detected |
| JavaScript | No new To-Dos for JavaScript detected | No completed To-Dos for JavaScript detected |
@oldmoe I tried to run it out of master branch but I am getting this error
gems/litestack-5d383d83c767/lib/litestack/litedb.rb:131:in `initialize': no such table: talks_search_idx_row (SQLite3::SQLException)
I tried to run in console
Talk.rebuild_index!
but it returns the same error
if I rollback to the latest official release, litesearch works ok (but no similarity search)
Ok I made some progress I had to go back to the previous version drop the index then switch back to master and rebuild the index
now I am getting this error
Thanks for trying it out, turns out this is due to the tokenizer being a trigram one, I am looking into how to avoid tokens that would cause syntax errors, could you please send me the data for the particular object you are testing?
I have just pushed a change that would fix the issue, but I am not sure of the quality of the similarity search using the terms stored in the trigram tokenized index, a porter or unicode tokenizer will yield much better similarity results. I think I will need to reconsider how similarity is implemented for trigram indexes specifically
The data can be found in /data https://github.com/adrienpoly/rubyvideo/tree/main/data
it is all the videos.yml file that are indexed by the Talk model
If you run this branch a simple rails db:create db:seed and bin/dev should get you up and running
then you can update the related_talks method to use lite search similar
closing for now