mampf icon indicating copy to clipboard operation
mampf copied to clipboard

Upgrade Solr

Open Splines opened this issue 1 year ago • 6 comments

The search engine Solr 8 has reached end of life in October 2024, see here. We should upgrade to version 9 and then soon after to version 10 that is about to be released.

Splines avatar Dec 14 '24 17:12 Splines

We should also consider the current state of solr: https://github.com/sunspot/sunspot/issues/1043

Splines avatar Dec 14 '24 19:12 Splines

In view of the state of the Solr gem, we might also consider implementing the search in Postgres/ActiveRecord instead. Search is not one of our features that is heavily used and our data sets are comparatively small, so this might possibly be an option (that would rid us of another dependency).

fosterfarrell9 avatar Dec 14 '24 19:12 fosterfarrell9

agree on that 👍 Solr is also started in its own Docker container and Elasticsearch would be too. It is in the end a whole search engine and I don't really think we need this big chunk to give users the possibility to serve basic attributes. On the other hand, such search engines provide some niceties like fuzzy search... Maybe there's some smaller dependencies just for that or even in-built PostgreSQL functions, let's see.

Splines avatar Dec 14 '24 19:12 Splines

It seems there are ways in PostgreSQL to do fuzzy searches using the pg_trgrm extension, see here. In fact, there is even a Rails gem for that.

fosterfarrell9 avatar Dec 14 '24 22:12 fosterfarrell9

@Splines I see this project is related to a university -- I work for a non-profit and I'd be happy to share a few helper+concerns we've written to transition from our Solr setup to Elastic. I don't know much about how Solr is used in your environment and (with possible all caveats) I would tend to agree with @fosterfarrell9 pg can be really good at searching, for more simple use-cases, and keeps the stack pruned.

Where Postgres is limited (without some schema+code re-engineering) is multi-table search. If you have a search looking up different record types, then a proper search index will give you the best outcome. Ironically, this is NOT what the elasticsearch-rails gem is good at (it emulates your db-models for faster full-text search+retrieval).

Working around it to achieve truer cross-model search was straight forward, in our case with few limitations.

HTH

ryders avatar Dec 24 '24 12:12 ryders

Hey @ryders, thanks for your input. As we only use the search at a few occurences, we will try out how we get along with the native pg search functionality first and drop solr entirely. Then, if we feel that this is too limiting, we might try elasticsearch (or even something else). In that case, we'd come back to you to ask for the helper & concerns, thanks for suggesting that to us.

In the long-term, we also want to revamp the search functionality as it is a bit confusing in the UI, e.g. we have a search only for media and then another search only for lectures. We might consolidate this into one UI where users can search for anything (and apply fine-grained filteres if they like to). So in this case (multi-table search), elasticsearch might indeed be a good option. We'll have to further evaluate on this and currently other things have more priority.

A potentially interesting blog post about this topic

Splines avatar Dec 28 '24 02:12 Splines

Some updates on this topic:

  • During a project work at uni, @f-buerckel has worked on applying LLMs to automatically subtitle MaMpf videos (even when they contain tons of math expressions). Then, semantic information is extracted from subtitles to enable users to search for a question. As a result, they get back exact video positions, e.g. minute 3:14 in a specific video. Even though this is currently a prototype, it has great potential to bring a very powerful search to MaMpf. It is not clear at this stage, if the prototype will finally make it to MaMpf, but let's see, this here is just to teaser this a bit.
  • Second, @fosterfarrell9 recently stumbled upon typesense as open-source search engine. Maybe, this is something for us, we'll take a look at it.

This issue is not at our top priority at the moment since more important issues exist that we need to address first, e.g. we will finally switch from Webpacker to Vite, upgrade Rails etc.

Splines avatar Jul 13 '25 16:07 Splines