rubyvideo
                                
                                
                                
                                    rubyvideo copied to clipboard
                            
                            
                            
                        Store vector embeddings for talks, and display related talks
This PR adds a "Talks you might be interested in" below each talk. This uses the newly released vector support in meilisearch to find talks that share some similarity with the current one.
Support for vector search is currently non-existant in meilisearch-rails and meilisearch-ruby so we cannot easily (?) fetch the similarity scores to keep only the most interesting ones, but the results are quite good as it is.
To make it work, you need to specify an OpenAI API key in .env, and launch a reindex as follows:
Talk.reembed!
Note that vector support is still a bit fiddly, so you may have to start from a fresh meilisearch database if it doesn't work for you on the first try (tip: inspect GET localhost:7700/tasks to see if anything is going wrong when indexing).
Demo below:
https://github.com/adrienpoly/rubyvideo/assets/6114/8a7978af-3585-49ae-837e-d8f906c44bf3
- Announcement post on meilisearch - https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647
 
Closes #18.
Thanks @crohr for this prototype it looks very promising.
Embeddings
To move forward if we want to have this in production I guess we could as a first step put in place the logic to compute the embeddings and store them in the talk model. With some kind of logic to recompute them every time title / description changes. This should prevent recomputing the embeddings for every reindex
Then Meillisearch would just index that new column from our model.
Can this work or I am missing something?
UI
I supposed you put the list below for testing. Ultimately this should feed the cards that are on the right and replace this random suggestion. But while developing let's keep it like that. At some point, we might want to deploy this feature to prod behind a feature flag so that will make it easier to test
Devops
To release it to prod I will need to update the Meillisearch engine. We probably need to wait a bit as it seems pretty new and they highly recommend waiting. This being said whatever result we get out of it will always be better than a random suggestion.
@Kerollmops I've tried using will_save_change_to__vectors? (see https://github.com/meilisearch/meilisearch-rails#custom-attribute-definition) to make meilisearch avoid reindexing vectors in case title or description hasn't changed, but it looks like _vectors is not seen as an attribute when querying the index settings, and therefore the code in meilisearch-rails doesn't go through the will_save method.
  # this is never called
  def will_save_change_to__vectors?
    will_save_change_to_title? || will_save_change_to_description?
  end
Any reason why this line calls settings.get_attributes instead of get_attributes (get_attributes does have the _vectors key)?
Embeddings
To move forward if we want to have this in production I guess we could as a first step put in place the logic to compute the embeddings and store them in the talk model. With some kind of logic to recompute them every time title / description changes. This should prevent recomputing the embeddings for every reindex
Then Meillisearch would just index that new column from our model.
Can this work or I am missing something?
I tried to selectively tell meilisearch to ignore _vectors when title or description hasn't changed, but it doesn't seem to work. So yes storing there embedding in sqlite and recomputing with classic AR callbacks if title or description changes would work.
UI
I supposed you put the list below for testing. Ultimately this should feed the cards that are on the right and replace this random suggestion. But while developing let's keep it like that. At some point, we might want to deploy this feature to prod behind a feature flag so that will make it easier to test
Yes, kept it simple for now, you're the UI guy :) Also, I didn't really notice that videos on the right were supposed to be "more like the current one". I think it could make sense to keep both exploratory videos on the right and related videos below the one you've just visioned, but maybe I'm wrong.
Devops
To release it to prod I will need to update the Meillisearch engine. We probably need to wait a bit as it seems pretty new and they highly recommend waiting. This being said whatever result we get out of it will always be better than a random suggestion.
Not sure how you deploy the meilisearch container / process on the server, but yes since there is nothing in meilisearch that can't be reindexed from sqlite, I think it's ok to deploy alpha/beta software in that case.
Hey @crohr π
Any reason why this line calls
settings.get_attributesinstead ofget_attributes(get_attributesdoes have the_vectorskey)?
Sorry for the delay. I will summon @brunoocasali on this one. I would expect this to work as for the Ruby integration, _vectors should look the same as any other field.
Note that, currently, Meilisearch isn't particularly smart when only the title or the description is updated it will reindex the document entirely, the vectors too! I am currently working on something that could help in this regard...
Hey @crohr π Do you plan to release this recommendation system? We improved the Vector Store solution since then π§
@Kerollmops thanks for the update I will look into updating the Meilisearch version into the hosting platform. There is an official docker image available now? If I understand correctly I need a 1.3+ version to enable it right?
@adrienpoly Indeed, you need a v1.3.x and we provide a Docker image and all sort of binaries βΊοΈ
Hey @crohr π Do you plan to release this recommendation system? We improved the Vector Store solution since then π§
Sure, I will have another look this week, thanks for the update!
@adrienpoly ready to be reviewed, I've updated the description with the new task to run
thanks @crohr I made a bit of preparation work in #64 to integrate on the front end part the results of the suggestions and to isolate then into a frame so that initial page load for the Talk#show route is not coupled to this suggestion method.
My next step is to upgrade Meilisearch in the prod environment and it is not as plug and play as I hope it would be... Anyway it is not neither a very high traffic site so If search is down for a little time that should be ok π
Will try to look back into this soon. Thanks for your work
Thanks for exploring this lots of changed have happened since so I will be closing it. I am now looking at brining this feature back with a full sqlite solution.
JoyOfRails has implemented a solution that I think is looking promising https://github.com/joyofrails/joyofrails.com/pull/280