meilisearch-dotnet icon indicating copy to clipboard operation
meilisearch-dotnet copied to clipboard

Using _Formatted value when searching is not user friendly

Open juchom opened this issue 2 years ago • 16 comments

While I was working on the code I saw something that is not user friendly for C# and Java developers.

If you look at the documentation for Highlighting : https://docs.meilisearch.com/reference/api/search.html#attributes-to-highlight

Here is the json response of the server :

{
  "id": 50393,
  "title": "Kung Fu Panda Holiday",
  "poster": "https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
  "overview": "The Winter Feast is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal Winter Feast at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
  "release_date": 1290729600,
  "_formatted": {
    "id": 50393,
    "title": "Kung Fu Panda Holiday",
    "poster": "https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
    "overview": "The <em>Winter Feast</em> is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal <em>Winter Feast</em> at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
    "release_date": 1290729600
  }
}

The server returns the document and add an extra field _formatted with a modified copy of the document.

Let see the problem now with C#:

We're going to work with this unit test :

https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/SearchTests.cs#L76-L94

We are working on the _basicIndex and here is the definition of it :

https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/IndexFixture.cs#L46-L69

We are adding a collection of Movie and here is the definition :

https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/Movie.cs#L3-L10

Now, on line 86 of the test we are doing a Search query against Meilisearch with this code :

     var movies = await _basicIndex.SearchAsync<FormattedMovie>( 
         "man", 
         new SearchQuery { AttributesToHighlight = new string[] { "name" } }); 

We are querying an index that contains Movie documents, but we ask to map to another class FormattedMovie with this definition :

https://github.com/meilisearch/meilisearch-dotnet/blob/3302d2b7c529c36de31298aaed7ac62f414170a1/tests/Meilisearch.Tests/Movie.cs#L38-L48

We have to duplicate the Movie class with a FormattedMovie class in order to have the _Formatted property available.

This is not very user friendly, because users will have to create two objects in order to fully use Meilisearch.

There is no way in C# to create a specific object that will enhance a base object with a _Formatted property like this :

public class FormattedDocument<T> : T
{
    public T _Formatted { get; set; }
}

There is actually also a small risk of collision if a document contains a _formatted property.

I told Guillaume it would great to have a response with an original and formatted properties that contains both a version of the document.

{
   "original":{
      "id":50393,
      "title":"Kung Fu Panda Holiday",
      "poster":"https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
      "overview":"The Winter Feast is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal Winter Feast at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
      "release_date":1290729600
   },
   "formatted":{
      "id":50393,
      "title":"Kung Fu Panda Holiday",
      "poster":"https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
      "overview":"The <em>Winter Feast</em> is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal <em>Winter Feast</em> at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
      "release_date":1290729600
   }
}

One more interesting thing in having a response like the one above is that if you want to add extra information like the global score of this result, or a score per field, you could enhance it easily without any risk of fields collisions.

But it's too late for breaking changes. So we need to find a way for developers (C# / Java) to enjoy this feature without too much pain.

Ping @gmourier @brunoocasali

juchom avatar Sep 01 '22 15:09 juchom

I just checked the rust client for this issue, I'm no expert but according to the doc :

https://github.com/meilisearch/meilisearch-rust/blob/261fa529e795a831035c7fb30f27313ea9496af7/src/search.rs#L12-L25

This works thanks to serde, who has the ability to flatten structs.

The unit tests in Java are passing because the Movie has a _formatted property, which is not supposed to be defined in the definition of a movie.

https://github.com/meilisearch/meilisearch-java/blob/e157c2dbb014aa581a887585890dbf2ee1947c4d/src/test/java/com/meilisearch/sdk/utils/Movie.java#L5-L15

In C# we saw, that we create an index with a Movie class and query with a FormattedMovie class.

The way the response is sent actually does not play well with strongly type languages.

In C# and Java, the unit tests are not really well shaped, in rust this works because of the flattening feature of serde.

I think, everything is here about this issue.

juchom avatar Sep 02 '22 08:09 juchom

Thank you, @Juchom, for your research and this very well-detailed issue.

But it's too late for breaking changes.

Indeed, we want 0 breaking changes to not involve any significant and tedious changes until 1.0, especially on the search endpoint.

Nevertheless, this is something we may consider after having reached a v1.0. We know that we will have to make some breaking changes afterward, but it's unclear how we will organize all this to be the most convenient for the users and us in terms of management. We are working on it to elaborate strategies to handle this type of change in the future.

gmourier avatar Sep 02 '22 09:09 gmourier

@juchom, thanks for the issue. Indeed very well explained, and I could feel the pain here 😞

Do you think it could be possible to work around this problem internally in the meilisearch-dotnet by adding a post json processing before the actual object creation?

return await responseMessage.Content
                .ReadFromJsonAsync<SearchResult<T>>(cancellationToken: cancellationToken).ConfigureAwait(false);

brunoocasali avatar Sep 02 '22 13:09 brunoocasali

Good question ? Very good question...

Digging further, I found this issue opened in System.Text.Json in order to have a JsonPropertyFlatten attribute like in serde : https://github.com/dotnet/runtime/issues/55120

Not sure this is going to happen soon and there is nothing that could help with it on Github.

We only have bad solutions right now,

The worst one, trying to create our own JsonPropertyFlatten and create an object like in rust client, not sur how long would it take to do it, test it, avoid too bad performance but it would look like something like this in the end.

public class SearchResult<T>
{
    [JsonPropertyFlatten]
    public T Result { get; set; }

    [JsonPropertyName("_formatted")]
    public T FormattedResult { get; set; }
}

Writing this, just made me realised, that the rust client is already patching the json response to map to the proposed json response thanks to serde's flatten attribute.

Second option, bad developer experience but it would work without side effect if we use custom json options write a clear documentation on how to deal with it.

In my case I'm going to wrap everything where needed and create the second class by hand.

// My indexed object
public class Movie 
{ 
    public string Id { get; set; } 
  
    public string Name { get; set; } 
  
    public string Genre { get; set; } 
} 

// The search result class, a bit like the unit tests, but we keep refactoring tools working
public class MovieSearchResult : Movie
{
    [JsonPropertyName("_formatted")]
    public Movie Formatted { get; set; } 
} 

Maybe a source generator could help doing this second class automatically for the developer, but I have never used them before and I'm not sure it would be good idea to offer this option if it works. (Andrew Lock has a very good blog about them : https://andrewlock.net/creating-a-source-generator-part-1-creating-an-incremental-source-generator/)

In my case I don't have a bunch of documents so doing it by hand is ok, but for people with bigger projects source generators is probably the best bet.

Concerning Java, I think code generators exist but I have no idea of what we can do. There are some project on github to flatten json objects, but it seems that it's not following the same pattern like serde.

In my case I want to be extra safe, I would go with writing a second inherited class and let System.Text.Json do the job.

Concerning Meilisearch server, here is the response object :

https://github.com/meilisearch/meilisearch/blob/c4453340707f0b8ed6168118dff2e4d2ae634f8d/meilisearch-lib/src/index/search.rs#L86-L94

/// A single result.
/// Contains the complete object, optionally the formatted object, and optionally an object that contains information about the matches.
#[derive(Deserialize, Debug)]
pub struct SearchResult<T> {
    /// The full result.
    #[serde(flatten)]
    pub result: T,
    /// The formatted result.
    #[serde(rename = "_formatted")]
    pub formatted_result: Option<Map<String, Value>>,
    /// The object that contains information about the matches.
    #[serde(rename = "_matchesPosition")]
    pub matches_position: Option<HashMap<String, Vec<MatchRange>>>,
}

Without #[serde(flatten)] this object looks like the proposed json response.

I would encourage you to think twice before using #[serde(flatten)], this is making complex transformations when serializing / deserializing object to json. This suppose that this transformation must be available in every langages with all the same rules.

Just to be clear, there is no arrogance in my last sentence, I have been caught in this trap once it was a GUID serialization / deserialization issue between Python and C#.

And again your are doing a really good job and Meilisearch is a really great product 👍

juchom avatar Sep 02 '22 16:09 juchom

Thanks @juchom ! Came across this caveat and could immediately workaround it with your approach.

FelixLorenz avatar Oct 05 '22 08:10 FelixLorenz

As the SDK does not support highlights, maybe the readme should mention it? It even gives an example using AttributesToHighlight without mentioning that this won’t actually work.

Christoph-Wagner avatar Feb 08 '23 14:02 Christoph-Wagner

For me it works fine, I got highlighting in all "attrA", "attrB", "attrC".


var myIndexSettings = new Settings
{
        DisplayedAttributes = new string[] { "*" },
        SearchableAttributes = new string[] { "attrA", "attrB", "attrC" },
};

var result = await Index.SearchAsync<MyType>(
                  queryString,
                  new SearchQuery()
                  {
                      AttributesToHighlight = new string[] { "attrA", "attrB", "attrC" }, // will be returned formatted
                      AttributesToCrop = new string[] { "attrC" }, // will be returned formatted
                      HighlightPreTag = "<mark>",
                      HighlightPostTag = "</mark>",
                      CropLength = 60,
                  },
                  cancellationToken: cancellationToken
                  );

FelixLorenz avatar Feb 08 '23 20:02 FelixLorenz

That is weird. Is that on the master build, or the currently released NuGet build?

If I call it with AttributesToHighlight = new string[] { "text" }, I get no formatting, but if I inspect the HttpResponseMessage, I see the "_formatted" object in JSON that has the expected highlighting.

Christoph-Wagner avatar Feb 09 '23 14:02 Christoph-Wagner

Did you implement juchom's solution with the search result class that inherits from the actual type you have an index for? If not, the formatted json prop will be omitted during deserialization since there is no matching prop in your type.

FelixLorenz avatar Feb 09 '23 15:02 FelixLorenz

No, that’s my whole point, that it’s not supported out of the box.

Christoph-Wagner avatar Feb 09 '23 15:02 Christoph-Wagner

@Christoph-Wagner that's right, it's not supported directly because there is no clean strategy to support this json transformation made server side.

This issue has been discovered after the api freeze this is why it works like this for now and you have to chose what best fit your needs.

Now that v1 is out, maybe @gmourier or @brunoocasali may have some insight on this issue.

juchom avatar Feb 13 '23 08:02 juchom

@juchom Recently while working on the meilisearch flutter ui kit, I tried an approach, where we wrapped all the returned results from meilisearch with a helper object that would understand all the extra fields added by meilisearch and parse them correctly. Relevant code: https://github.com/meilisearch/meilisearch-flutter/blob/58ccfc5a4cb4bea36fee0fa52817c70e5d3c4944/lib/src/models/result_container.dart#L3-L16

class MeilisearchResultContainer<T> {
  final T parsed;
  final Map<String, dynamic> src;
  final Map<String, dynamic>? formatted;
  final Searcheable<Map<String, Object?>> fromResult;
  final SearchQuery fromQuery;

  MeilisearchResultContainer({
    required this.src,
    required this.parsed,
    required this.fromQuery,
    required this.fromResult,
  }) : formatted = src['_formatted'];
}

it would also reference the source query and the parent search result, but these can be omitted of course.

this made handling meilisearch specific fields way easier

ahmednfwela avatar Jun 28 '23 23:06 ahmednfwela

If the product team decide to keep the original object without transformation and make this change soon, then I think it will be better to wait.

Otherwise, it will be probably wise to use your work and provide a solution that makes dotnet devs life easier.

juchom avatar Jun 29 '23 08:06 juchom

I can safely state that a v2 is not in our current plans. And also, the same behavior is being used in other features like the new vector store feature (_semanticSimilarity, _vectors), and ranking details (_rankingScore, _rankingScoreDetails).

So I think if the implementation made by @ahmednfwela is good enough, we should definitely merge it :)

Then in the future, when Meilisearch starts to plan a possible v2, we can decide to go for a meta: {} key that wraps all those keys at once or something similar if needed.

Are you ok with that @juchom and @ahmednfwela?

brunoocasali avatar Jul 06 '23 15:07 brunoocasali

I have the same issue. Any updates on this? There are workarounds to fix it, but as other people mentioned, the SearchAsync method offers highlighting. You would be surprised that it doesn't work out of the box. Fixing it using either a custom JSON converter or code generator is better than nothing, IMO.

MoienTajik avatar May 31 '24 14:05 MoienTajik