meilisearch-dotnet
meilisearch-dotnet copied to clipboard
Using _Formatted value when searching is not user friendly
While I was working on the code I saw something that is not user friendly for C# and Java developers.
If you look at the documentation for Highlighting : https://docs.meilisearch.com/reference/api/search.html#attributes-to-highlight
Here is the json response of the server :
{
"id": 50393,
"title": "Kung Fu Panda Holiday",
"poster": "https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
"overview": "The Winter Feast is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal Winter Feast at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
"release_date": 1290729600,
"_formatted": {
"id": 50393,
"title": "Kung Fu Panda Holiday",
"poster": "https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
"overview": "The <em>Winter Feast</em> is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal <em>Winter Feast</em> at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
"release_date": 1290729600
}
}
The server returns the document and add an extra field _formatted
with a modified copy of the document.
Let see the problem now with C#:
We're going to work with this unit test :
https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/SearchTests.cs#L76-L94
We are working on the _basicIndex
and here is the definition of it :
https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/IndexFixture.cs#L46-L69
We are adding a collection of Movie
and here is the definition :
https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/Movie.cs#L3-L10
Now, on line 86 of the test we are doing a Search query against Meilisearch with this code :
var movies = await _basicIndex.SearchAsync<FormattedMovie>(
"man",
new SearchQuery { AttributesToHighlight = new string[] { "name" } });
We are querying an index that contains Movie
documents, but we ask to map to another class FormattedMovie
with this definition :
https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/Movie.cs#L38-L48
We have to duplicate the Movie
class with a FormattedMovie
class in order to have the _Formatted
property available.
This is not very user friendly, because users will have to create two objects in order to fully use Meilisearch.
There is no way in C# to create a specific object that will enhance a base object with a _Formatted
property like this :
public class FormattedDocument<T> : T
{
public T _Formatted { get; set; }
}
There is actually also a small risk of collision if a document contains a _formatted
property.
I told Guillaume it would great to have a response with an original
and formatted
properties that contains both a version of the document.
{
"original":{
"id":50393,
"title":"Kung Fu Panda Holiday",
"poster":"https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
"overview":"The Winter Feast is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal Winter Feast at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
"release_date":1290729600
},
"formatted":{
"id":50393,
"title":"Kung Fu Panda Holiday",
"poster":"https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
"overview":"The <em>Winter Feast</em> is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal <em>Winter Feast</em> at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
"release_date":1290729600
}
}
One more interesting thing in having a response like the one above is that if you want to add extra information like the global score of this result, or a score per field, you could enhance it easily without any risk of fields collisions.
But it's too late for breaking changes. So we need to find a way for developers (C# / Java) to enjoy this feature without too much pain.
Ping @gmourier @brunoocasali
I just checked the rust client for this issue, I'm no expert but according to the doc :
https://github.com/meilisearch/meilisearch-rust/blob/261fa529e795a831035c7fb30f27313ea9496af7/src/search.rs#L12-L25
This works thanks to serde, who has the ability to flatten structs.
The unit tests in Java are passing because the Movie
has a _formatted
property, which is not supposed to be defined in the definition of a movie.
https://github.com/meilisearch/meilisearch-java/blob/e157c2dbb014aa581a887585890dbf2ee1947c4d/src/test/java/com/meilisearch/sdk/utils/Movie.java#L5-L15
In C# we saw, that we create an index with a Movie
class and query with a FormattedMovie
class.
The way the response is sent actually does not play well with strongly type languages.
In C# and Java, the unit tests are not really well shaped, in rust this works because of the flattening feature of serde.
I think, everything is here about this issue.
Thank you, @Juchom, for your research and this very well-detailed issue.
But it's too late for breaking changes.
Indeed, we want 0 breaking changes to not involve any significant and tedious changes until 1.0, especially on the search endpoint.
Nevertheless, this is something we may consider after having reached a v1.0. We know that we will have to make some breaking changes afterward, but it's unclear how we will organize all this to be the most convenient for the users and us in terms of management. We are working on it to elaborate strategies to handle this type of change in the future.
@juchom, thanks for the issue. Indeed very well explained, and I could feel the pain here 😞
Do you think it could be possible to work around this problem internally in the meilisearch-dotnet
by adding a post json processing before the actual object creation?
return await responseMessage.Content
.ReadFromJsonAsync<SearchResult<T>>(cancellationToken: cancellationToken).ConfigureAwait(false);
Good question ? Very good question...
Digging further, I found this issue opened in System.Text.Json
in order to have a JsonPropertyFlatten
attribute like in serde
: https://github.com/dotnet/runtime/issues/55120
Not sure this is going to happen soon and there is nothing that could help with it on Github.
We only have bad solutions right now,
The worst one, trying to create our own JsonPropertyFlatten
and create an object like in rust client, not sur how long would it take to do it, test it, avoid too bad performance but it would look like something like this in the end.
public class SearchResult<T>
{
[JsonPropertyFlatten]
public T Result { get; set; }
[JsonPropertyName("_formatted")]
public T FormattedResult { get; set; }
}
Writing this, just made me realised, that the rust client is already patching the json response to map to the proposed json response thanks to serde's flatten attribute.
Second option, bad developer experience but it would work without side effect if we use custom json options write a clear documentation on how to deal with it.
In my case I'm going to wrap everything where needed and create the second class by hand.
// My indexed object
public class Movie
{
public string Id { get; set; }
public string Name { get; set; }
public string Genre { get; set; }
}
// The search result class, a bit like the unit tests, but we keep refactoring tools working
public class MovieSearchResult : Movie
{
[JsonPropertyName("_formatted")]
public Movie Formatted { get; set; }
}
Maybe a source generator could help doing this second class automatically for the developer, but I have never used them before and I'm not sure it would be good idea to offer this option if it works. (Andrew Lock has a very good blog about them : https://andrewlock.net/creating-a-source-generator-part-1-creating-an-incremental-source-generator/)
In my case I don't have a bunch of documents so doing it by hand is ok, but for people with bigger projects source generators is probably the best bet.
Concerning Java, I think code generators exist but I have no idea of what we can do. There are some project on github to flatten json objects, but it seems that it's not following the same pattern like serde.
In my case I want to be extra safe, I would go with writing a second inherited class and let System.Text.Json
do the job.
Concerning Meilisearch server, here is the response object :
https://github.com/meilisearch/meilisearch/blob/c4453340707f0b8ed6168118dff2e4d2ae634f8d/meilisearch-lib/src/index/search.rs#L86-L94
/// A single result.
/// Contains the complete object, optionally the formatted object, and optionally an object that contains information about the matches.
#[derive(Deserialize, Debug)]
pub struct SearchResult<T> {
/// The full result.
#[serde(flatten)]
pub result: T,
/// The formatted result.
#[serde(rename = "_formatted")]
pub formatted_result: Option<Map<String, Value>>,
/// The object that contains information about the matches.
#[serde(rename = "_matchesPosition")]
pub matches_position: Option<HashMap<String, Vec<MatchRange>>>,
}
Without #[serde(flatten)]
this object looks like the proposed json response.
I would encourage you to think twice before using #[serde(flatten)]
, this is making complex transformations when serializing / deserializing object to json. This suppose that this transformation must be available in every langages with all the same rules.
Just to be clear, there is no arrogance in my last sentence, I have been caught in this trap once it was a GUID serialization / deserialization issue between Python and C#.
And again your are doing a really good job and Meilisearch is a really great product 👍
Thanks @juchom ! Came across this caveat and could immediately workaround it with your approach.
As the SDK does not support highlights, maybe the readme should mention it? It even gives an example using AttributesToHighlight
without mentioning that this won’t actually work.
For me it works fine, I got highlighting in all "attrA", "attrB", "attrC".
var myIndexSettings = new Settings
{
DisplayedAttributes = new string[] { "*" },
SearchableAttributes = new string[] { "attrA", "attrB", "attrC" },
};
var result = await Index.SearchAsync<MyType>(
queryString,
new SearchQuery()
{
AttributesToHighlight = new string[] { "attrA", "attrB", "attrC" }, // will be returned formatted
AttributesToCrop = new string[] { "attrC" }, // will be returned formatted
HighlightPreTag = "<mark>",
HighlightPostTag = "</mark>",
CropLength = 60,
},
cancellationToken: cancellationToken
);
That is weird. Is that on the master build, or the currently released NuGet build?
If I call it with AttributesToHighlight = new string[] { "text" },
I get no formatting, but if I inspect the HttpResponseMessage, I see the "_formatted"
object in JSON that has the expected highlighting.
Did you implement juchom's solution with the search result class that inherits from the actual type you have an index for? If not, the formatted json prop will be omitted during deserialization since there is no matching prop in your type.
No, that’s my whole point, that it’s not supported out of the box.
@Christoph-Wagner that's right, it's not supported directly because there is no clean strategy to support this json transformation made server side.
This issue has been discovered after the api freeze this is why it works like this for now and you have to chose what best fit your needs.
Now that v1 is out, maybe @gmourier or @brunoocasali may have some insight on this issue.
@juchom Recently while working on the meilisearch flutter ui kit, I tried an approach, where we wrapped all the returned results from meilisearch with a helper object that would understand all the extra fields added by meilisearch and parse them correctly. Relevant code: https://github.com/meilisearch/meilisearch-flutter/blob/58ccfc5a4cb4bea36fee0fa52817c70e5d3c4944/lib/src/models/result_container.dart#L3-L16
class MeilisearchResultContainer<T> {
final T parsed;
final Map<String, dynamic> src;
final Map<String, dynamic>? formatted;
final Searcheable<Map<String, Object?>> fromResult;
final SearchQuery fromQuery;
MeilisearchResultContainer({
required this.src,
required this.parsed,
required this.fromQuery,
required this.fromResult,
}) : formatted = src['_formatted'];
}
it would also reference the source query and the parent search result, but these can be omitted of course.
this made handling meilisearch specific fields way easier
If the product team decide to keep the original object without transformation and make this change soon, then I think it will be better to wait.
Otherwise, it will be probably wise to use your work and provide a solution that makes dotnet devs life easier.
I can safely state that a v2 is not in our current plans. And also, the same behavior is being used in other features like the new vector store feature (_semanticSimilarity
, _vectors
), and ranking details (_rankingScore
, _rankingScoreDetails
).
So I think if the implementation made by @ahmednfwela is good enough, we should definitely merge it :)
Then in the future, when Meilisearch starts to plan a possible v2, we can decide to go for a meta: {}
key that wraps all those keys at once or something similar if needed.
Are you ok with that @juchom and @ahmednfwela?
I have the same issue. Any updates on this? There are workarounds to fix it, but as other people mentioned, the SearchAsync
method offers highlighting. You would be surprised that it doesn't work out of the box. Fixing it using either a custom JSON converter or code generator is better than nothing, IMO.