rubyvideo icon indicating copy to clipboard operation
rubyvideo copied to clipboard

Store vector embeddings for talks, and display related talks

Open crohr opened this issue 2 years ago β€’ 10 comments

This PR adds a "Talks you might be interested in" below each talk. This uses the newly released vector support in meilisearch to find talks that share some similarity with the current one.

Support for vector search is currently non-existant in meilisearch-rails and meilisearch-ruby so we cannot easily (?) fetch the similarity scores to keep only the most interesting ones, but the results are quite good as it is.

To make it work, you need to specify an OpenAI API key in .env, and launch a reindex as follows:

Talk.reembed!

Note that vector support is still a bit fiddly, so you may have to start from a fresh meilisearch database if it doesn't work for you on the first try (tip: inspect GET localhost:7700/tasks to see if anything is going wrong when indexing).

Demo below:

https://github.com/adrienpoly/rubyvideo/assets/6114/8a7978af-3585-49ae-837e-d8f906c44bf3

  • Announcement post on meilisearch - https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647

Closes #18.

crohr avatar Jun 20 '23 15:06 crohr

Thanks @crohr for this prototype it looks very promising.

Embeddings

To move forward if we want to have this in production I guess we could as a first step put in place the logic to compute the embeddings and store them in the talk model. With some kind of logic to recompute them every time title / description changes. This should prevent recomputing the embeddings for every reindex

Then Meillisearch would just index that new column from our model.

Can this work or I am missing something?

UI

I supposed you put the list below for testing. Ultimately this should feed the cards that are on the right and replace this random suggestion. But while developing let's keep it like that. At some point, we might want to deploy this feature to prod behind a feature flag so that will make it easier to test

Devops

To release it to prod I will need to update the Meillisearch engine. We probably need to wait a bit as it seems pretty new and they highly recommend waiting. This being said whatever result we get out of it will always be better than a random suggestion.

adrienpoly avatar Jun 21 '23 05:06 adrienpoly

@Kerollmops I've tried using will_save_change_to__vectors? (see https://github.com/meilisearch/meilisearch-rails#custom-attribute-definition) to make meilisearch avoid reindexing vectors in case title or description hasn't changed, but it looks like _vectors is not seen as an attribute when querying the index settings, and therefore the code in meilisearch-rails doesn't go through the will_save method.

  # this is never called
  def will_save_change_to__vectors?
    will_save_change_to_title? || will_save_change_to_description?
  end

Any reason why this line calls settings.get_attributes instead of get_attributes (get_attributes does have the _vectors key)?

crohr avatar Jun 21 '23 21:06 crohr

Embeddings

To move forward if we want to have this in production I guess we could as a first step put in place the logic to compute the embeddings and store them in the talk model. With some kind of logic to recompute them every time title / description changes. This should prevent recomputing the embeddings for every reindex

Then Meillisearch would just index that new column from our model.

Can this work or I am missing something?

I tried to selectively tell meilisearch to ignore _vectors when title or description hasn't changed, but it doesn't seem to work. So yes storing there embedding in sqlite and recomputing with classic AR callbacks if title or description changes would work.

UI

I supposed you put the list below for testing. Ultimately this should feed the cards that are on the right and replace this random suggestion. But while developing let's keep it like that. At some point, we might want to deploy this feature to prod behind a feature flag so that will make it easier to test

Yes, kept it simple for now, you're the UI guy :) Also, I didn't really notice that videos on the right were supposed to be "more like the current one". I think it could make sense to keep both exploratory videos on the right and related videos below the one you've just visioned, but maybe I'm wrong.

Devops

To release it to prod I will need to update the Meillisearch engine. We probably need to wait a bit as it seems pretty new and they highly recommend waiting. This being said whatever result we get out of it will always be better than a random suggestion.

Not sure how you deploy the meilisearch container / process on the server, but yes since there is nothing in meilisearch that can't be reindexed from sqlite, I think it's ok to deploy alpha/beta software in that case.

crohr avatar Jun 21 '23 21:06 crohr

Hey @crohr πŸ‘‹

Any reason why this line calls settings.get_attributes instead of get_attributes (get_attributes does have the _vectors key)?

Sorry for the delay. I will summon @brunoocasali on this one. I would expect this to work as for the Ruby integration, _vectors should look the same as any other field.

Note that, currently, Meilisearch isn't particularly smart when only the title or the description is updated it will reindex the document entirely, the vectors too! I am currently working on something that could help in this regard...

Kerollmops avatar Jun 26 '23 15:06 Kerollmops

Hey @crohr πŸ‘‹ Do you plan to release this recommendation system? We improved the Vector Store solution since then πŸ§‡

Kerollmops avatar Sep 04 '23 13:09 Kerollmops

@Kerollmops thanks for the update I will look into updating the Meilisearch version into the hosting platform. There is an official docker image available now? If I understand correctly I need a 1.3+ version to enable it right?

adrienpoly avatar Sep 04 '23 16:09 adrienpoly

@adrienpoly Indeed, you need a v1.3.x and we provide a Docker image and all sort of binaries ☺️

Kerollmops avatar Sep 05 '23 06:09 Kerollmops

Hey @crohr πŸ‘‹ Do you plan to release this recommendation system? We improved the Vector Store solution since then πŸ§‡

Sure, I will have another look this week, thanks for the update!

crohr avatar Sep 05 '23 07:09 crohr

@adrienpoly ready to be reviewed, I've updated the description with the new task to run

crohr avatar Sep 06 '23 09:09 crohr

thanks @crohr I made a bit of preparation work in #64 to integrate on the front end part the results of the suggestions and to isolate then into a frame so that initial page load for the Talk#show route is not coupled to this suggestion method.

My next step is to upgrade Meilisearch in the prod environment and it is not as plug and play as I hope it would be... Anyway it is not neither a very high traffic site so If search is down for a little time that should be ok πŸ˜„

Will try to look back into this soon. Thanks for your work

adrienpoly avatar Oct 23 '23 20:10 adrienpoly

Thanks for exploring this lots of changed have happened since so I will be closing it. I am now looking at brining this feature back with a full sqlite solution.

JoyOfRails has implemented a solution that I think is looking promising https://github.com/joyofrails/joyofrails.com/pull/280

adrienpoly avatar Nov 22 '24 16:11 adrienpoly