Search collections with scopes
Description
I want to discuss the use of collections in meilisearch-rails.
One could use the filterable attributes for simple cases. For example:
Article.ms_search('awesome', filter: "release_date > #{(Time.now - 1.day).to_i}")
Now, since meilisearch-rails already makes queries to the database to find items by the primary keys, my idea is meilisearch-rails could fully support collections as well, making it possible to run fewer queries in more advanced scenarios.
Basic example
Let's start with a simple example to illustrate:
# This works, because the collection "Article.where('id < 50')" is used as "meilisearch_options[:type]" at "meilisearch_options[:type].where(condition_key => hit_ids)"
Article.where('id < 50').ms_search(Article.first.name)
# This does not work ("undefined method `where' for #<Meilisearch::Rails::Pagination::Kaminari:0x0000560335df3660>")
Article.ms_search(Article.first.name).where('id < 50')
I think the order of scopes should not matter.
Perhaps the ms_search method could just return something like this: "where(id: hit_ids)"
In this case, correct me if I'm missing something, I think the pagination might not be necessary in this gem at all, it could be handled externally.
For example:
Article.ms_search('awesome').joins(:user).merge(User.admin).page(3).per(10)
would be equivalent to:
Article.where(id: [1, 10, 15]).joins(:user).merge(User.admin).page(3).per(10)
Other
- I consider this issue is related to:
- Issue #340 - Polymorphic shared indexes: The discussion is about an option to search an index and return a polymorphic collection with results from multiple models.
- Issue #341 - Index-first search: The discussion is about the fact "the starting point of any search is a Rails model, which is not conducive to things like shared indexes or multiple indexes". It proposes the creation of a new type of resource in a Rails app called an index, capable of returning polymorphic results (AnimalIndex.search returns Cats and Dogs).
- Issue #364 - Support for multiple searches of the same index in Multi Search
- Issue #389 - Support federated search: The federated multi-search makes it possible to use a single pagination for different models. The support for pagination backends (Kaminari and will_paginate) is missing at FederatedSearchResult.
- PR #391 - Allow custom group names in multi search
- PR #393 - Add federated search
Regarding the idea of abandoning pagination, I wonder if it will perform well, since the raw search results would include all items. Perhaps using displayed_attributes [:id] is sufficient.
What do you think guys?
Examples:
Video.ms_search('', sort: ['name:desc']).where('id < 50').page(1).per(1).first.id
=> 48
Video.ms_search('', sort: ['name:asc']).where('id < 50').page(1).per(1).first.id
=> 47
@ellnix @brunoocasali
I imagine something like this for the multisearch:
search_result = MeiliSearch::Rails.multi_search(
indexes: {
media: {
queries: {
audios: {
q: 'Michael Jackson',
collection: Audio.user(user),
filter: ['bitrate > 180000'],
page: 3
},
videos: {
q: 'Michael Jackson',
collection: Video.user(user),
filter: ['height >= 1080'], page: 8
}
}
},
shows: {
q: 'Michael Jackson',
collection: Show.all, # Can be the default value
filter: ['alone = TRUE'],
page: 3
}
},
)
audios = search_result.matches[:media][:audios] # Audio::ActiveRecord_Relation
videos = search_result.matches[:media][:videos] # Video::ActiveRecord_Relation
shows = search_result.matches[:shows] # Show::ActiveRecord_Relation
search_result = MeiliSearch::Rails.federated_search(
indexes: {
media: {
queries: {
audios: {
q: 'Michael Jackson',
collection: Audio.user(user),
filter: ['bitrate > 180000']
},
videos: {
q: 'Michael Jackson',
collection: Video.user(user),
filter: ['height >= 1080']
}
}
},
shows: {
q: 'Michael Jackson',
collection: Show.all,
filter: nil
}
},
page: 3
)
matches = search_result.matches
Regarding the idea of abandoning pagination, I wonder if it will perform well, since the raw search results would include all items. Perhaps using displayed_attributes [:id] is sufficient.
Indeed... If you don't paginate even if you don't have to handle the full meilisearch response but just the id, it will hit in performance issues somewhere.
But I do agree with using only the ids I even started a prof of concept here https://github.com/meilisearch/meilisearch-rails/issues/193 but I didn't have time to finish + the spec suite was not reliable at that time.
Sorry for the late reply!
search_result = MeiliSearch::Rails.multi_search('Michael Jackson' ...)
Multi search and federated search actually do not require that every request has the same query: https://www.meilisearch.com/docs/reference/api/multi_search#queries . You can search multiple indexes with different queries.
I imagine there's not much practical use for this, but it would be wrong not to support it.
Similarly, the filter is not shared among queries.
Passing a collection to multi search is a good idea, but I think it should be a separate issue.
The issue with implementing most of this is that ActiveRecord::Relation is harder to work with than plain ruby arrays, especially when it comes to making sure the sorting is correct.
Some comments on the current situation:
- It seems that we already have an issue when handling pagination even in simple searches (non-multi), as the search is performed on the index (not the collection, which is a subset of the index). This means that index pagination may not match collection pagination. For example, a search on the index with a limit of 10 results may return fewer than 10 results because the collection’s scopes are applied only afterward.
- Currently, multi_search (e.g., in FederatedSearchResult) does not seem to support queries on collections. It only works with models or simple indexes. (Will be fixed in PR #405.)
Now, I will present a more complete scenario to illustrate the issue I currently see and why I believe meilisearch-rails should return collections (while enforcing an order, of course).
Suppose the following models: Audio, Image, Video.
In this scenario, searches should retrieve only media that the user has access to (information that is not worth indexing in Meilisearch's database).
Therefore, the collections will be:
audios = Audio.user(current_user)
images = Image.user(current_user)
videos = Video.user(current_user)
Let's assume the primary key is the id field.
Let's consider two different types of searches:
- A simple search in one of the models, for example, Video.
- A federated search across these models.
In a simple search for videos, we have the following queries:
- A query to Meilisearch (ms_raw_search).
- A query to the database (Video.user(current_user).where(condition_key => hit_ids)).
In a federated search, since it is not possible to use collections but only models or simple indexes (will be fixed in PR #405), I must first perform an additional step:
- Queries to retrieve the IDs of each record. Example:
audio_ids = Audio.user(current_user).pluck(:id)
image_ids = Image.user(current_user).pluck(:id)
video_ids = Video.user(current_user).pluck(:id)
- A query to Meilisearch, filtering the results by ID.
- Queries to the database to retrieve the attributes of each record.
Note that if meilisearch-rails simply applied scopes to the collections passed to it, the first step above would not be necessary.
On the other hand, with the current behavior, if meilisearch-rails forces the query to the database and returns the records, it must already know which records should not be displayed (in this case, records the user does not have access to) for pagination to work correctly, as explained at the beginning. In other words, additional queries end up being necessary.
If meilisearch-rails returns collections, we can simplify the federated search into two steps:
- A query to Meilisearch to obtain the IDs of the records matching each query. Something like where(id: hit_ids).order(...) is applied to each collection.
- Queries to the database to retrieve the attributes of each record.
So, how do scoped collections and abandoning pagination within the gem improve this situation?
- Pagination is handled by the application based on the resulting collection.
- The extra query that evaluates the collection result before passing it to meilisearch-rails is no longer needed. This extra query could already be avoided if federated search supported collections (will be fixed in PR #405), just as simple search does. However, we would still face the pagination issue.
In summary, if meilisearch-rails returned collections instead of forcing queries to the database, we could eliminate redundant queries and handle pagination more effectively at the application level.
Note: I don't want to centralize the entire discussion in this issue. I know I've raised several points, and possibly even some bugs. My intention is to investigate them more thoroughly and then create specific issues for each. The goal here was just to gather the necessary elements for this discussion.