elasticsearch-net icon indicating copy to clipboard operation
elasticsearch-net copied to clipboard

Searches with FilterAggregations fails to deserialize the response from the server.

Open asal-hesehus opened this issue 1 year ago • 1 comments

Elastic.Clients.Elasticsearch version:8.1.3

Elasticsearch version:8.6.2

.NET runtime version:6.0

Operating system version:Windows 11

Description of the problem including expected versus actual behavior: A clear and concise description of what the bug is.

Steps to reproduce: Run this code:

var settings = new ElasticsearchClientSettings(new Uri("https://localhost:9200"))
    .Authentication(new BasicAuthentication("elastic", "ELASTIC_PASSWORD"));
settings.DisableDirectStreaming();

var docs = DocGenerator.Generate(20);
settings.ThrowExceptions(false);
var client = new ElasticsearchClient(settings);
var indexName = "doc_index";

await client.Indices.CreateAsync<Doc>(x =>
{
    x.Index(indexName).Mappings(y => y.Properties(p => p.Keyword(n => n.Color)));
});

await client.IndexManyAsync(docs, indexName);


var filters = new Buckets<Query>(new Dictionary<string, Query>
    { { "color_filter", new MatchQuery("Color") { Query = "red" } } });
var aggregationDescriptor = new AggregationDescriptor<Doc>();

aggregationDescriptor.Terms("color_aggregate",
        termsAggregationDescriptor => termsAggregationDescriptor.Field(n => n.Color))
    .Filters("bucket_name",
        descriptor => descriptor.Filters(filters));

var searchRequest = new SearchRequestDescriptor<Doc>();
searchRequest.Aggregations(aggregationDescriptor);
searchRequest.Index(indexName);

/*This fails with exception */
await client.SearchAsync(searchRequest);


public class Doc
{
    public string Id { get; set; }
    public string Color { get; set; }
}

public static class DocGenerator
{
    private static string[] colors = new[] { "red", "green", "blue" };

    public static Doc[] Generate(int docCount)
    {
        var docs = new Doc[docCount];
        for (int i = 0; i < docCount; i++)
        {
            docs[i] = new Doc() { Color = colors[i % 3], Id = i.ToString() };
        }
        return docs;
    }
}

Expected behavior SearchAsync returns a search result with the aggregations filters

Provide ConnectionSettings (if relevant):

Provide DebugInformation (if relevant): Stacktrace: 'The JSON value could not be converted to System.Collections.Generic.IReadOnlyCollection1[Elastic.Clients.Elasticsearch.Aggregations.FiltersBucket].'`

asal-hesehus avatar Jul 20 '23 09:07 asal-hesehus

I see multiple problems here. During deserialization:

{
  "...": "...",
  "aggregations": {
    "filters#bucket_name": {
      "buckets": {
        "color_filter": {
          "doc_count": 0
        }
      }
    },
    "sterms#color_aggregate": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "green",
          "doc_count": 7
        },
        {
          "key": "red",
          "doc_count": 7
        },
        {
          "key": "blue",
          "doc_count": 6
        }
      ]
    }
  }
}

I guess filters#.buckets should be a IReadOnlyDictionary<string, FiltersBucket> instead of IReadOnlyCollection<FiltersBucket>.

There as well seems to be an issue with the serialization:

{
  "aggregations": {
    "color_aggregate": {
      "terms": {
        "field": "color"
      }
    },
    "bucket_name": {
      "filters": {
        "filters": {
          "color_filter": {
            "match": {
              "Color": {
                "query": "red"
              }
            }
          }
        }
      }
    }
  }
}

The filters object seems to be nested too deeply which could be the same problem as in #7822

flobernd avatar Jul 27 '23 10:07 flobernd

This is fixed in 8.13.x.

flobernd avatar Apr 18 '24 10:04 flobernd

Hi @flobernd , I think I am seeing the same issue using client version 8.14.2 and 8.14 ElasticSearch. Shouldn't this be fixed in 8.13.x?

Downloading the code and changing the Buckets property in Elastic.Clients.Elasticsearch.Aggregations.FiltersAggregate from IReadOnlyCollection to IReadOnlyDictionary, I got it to work but obviously only as a workaround.

Also, I was expecting to find the Keyed option in Elastic.Clients.Elasticsearch.Aggregations.FiltersAggregation but didn't. I tried adding the option in the generated code just to see if returning the response with keyed=false would work with the IReadOnlyCollection but got a parse error from the unexpected "key" (just as an FYI...).

Any insight on what the best approach would be on this?

EDIT: I think this relates also with #8040 which says that the issue should be resolved by 8.13.x

giorgos-papanikos avatar Jun 18 '24 15:06 giorgos-papanikos

Hi @giorgos-papanikos, your issue and the linked one are both different cases. I'll investigate tomorrow why we get keyed results here instead of a flat list.

flobernd avatar Jun 18 '24 17:06 flobernd

@flobernd thanks for the prompt reply and looking into this. Apologies for linking the unrelated ticket.

With respect to the keyed results, if my inderstanding of the problem is correct, I think that the expected behavior here is indeeed the keyed results. At least this is what I get from the behavior explained here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html#non-keyed-response

By default the "keyed" option is set to true. That is why I mentioned also in my previous ticket that I was searching for the "keyed" property in "FiltersAggregation" to try setting it to false and try with the flat list but got a deserialization error because of the additional "key" property that was then in the response in that case.

Looking forward to hearing what your investigation brings up. Thanks

giorgos-papanikos avatar Jun 19 '24 05:06 giorgos-papanikos

@giorgos-papanikos No worries for linking the unrelated issues. It's always appreciated if users check for related issues first 🙂

Regarding the keyed situation: The client explicitly does not want keyed results and actively prevents them from being requested by skipping generation of the related property (usually this field defaults to false).

Could you please show me how you craft the query?

I tried to reproduce the issue but for me it worked correctly using this code:

var q = await client.SearchAsync<Person>(s => s
    .Aggregations(aggs => aggs
        .Add("my_agg", agg => agg
            .Filters(a => a.Filters(new Buckets<Query>([
                Query.MatchAll(new MatchAllQuery()),
                Query.MatchAll(new MatchAllQuery())
            ])))
        )
    )
);

var buckets = q.Aggregations!.GetFilters("my_agg")!.Buckets;

foreach (var bucket in buckets)
{
    Console.WriteLine(bucket.DocCount);
}
Successful (200) low level call on POST: /_search?pretty=true&error_trace=true&typed_keys=true

# Request:
{
  "aggregations": {
    "my_agg": {
      "filters": {
        "filters": [
          {
            "match_all": {}
          },
          {
            "match_all": {}
          }
        ]
      }
    }
  }
}
# Response:
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 20,
    "successful" : 20,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 56,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      "...omitted..."
    ]
  },
  "aggregations" : {
    "filters#my_agg" : {
      "buckets" : [
        {
          "doc_count" : 56
        },
        {
          "doc_count" : 56
        }
      ]
    }
  }
}

It seems like the problem occurs, if you request typed keys like this:

var q = await client.SearchAsync<Person>(s => s
    .Aggregations(aggs => aggs
        .Add("my_agg", agg => agg
            .Filters(a => a.Filters(new Buckets<Query>(new Dictionary<string, Query> // <- Dictionary instead of Array
            {
                { "a", Query.MatchAll(new MatchAllQuery()) },
                { "b", Query.MatchAll(new MatchAllQuery()) }
            })))
        )
    )
);

Initializing "Buckets" with a dictionary instead of a list causes the server to return the "typed" keys. See: Anonymous Filters.

I think for best usability we should add the key property to the FilterAggregateBucket and as well automatically set keyed to true when the filters are initialized with a dictionary.

flobernd avatar Jun 19 '24 07:06 flobernd

@flobernd yes, it is as you mentioned with the second example. I am infact using the dictionary initializer because I want to be able to corellate the results using the key. Otherwise I would need to rely on the order of results as explained in the link for the anonymous filters you added, but I feel reluctant relying to that.

If I undersand the direction you mention, both cases would be supported:

  • You would provide the option to set keyed to false and expect the "key" property in the FiltersBucket deserialization
  • Setting the keyed to true would also be supported but in that case, you would still transform the result in the a flat list with the key property set using the response object representation, or parse it as a dictionary?

And if I understand correctly, setting the keyed option to true or false would be implicit based on whether the user has utilized the list or dictionary initializer?

In all cases, I think the approach should be able to cover expecations.

What would you suggest as a workaround until the fix is available? Right now I have created an internal nuget changing the Buckets property in Elastic.Clients.Elasticsearch.Aggregations.FiltersAggregate from IReadOnlyCollection to IReadOnlyDictionary.

giorgos-papanikos avatar Jun 19 '24 09:06 giorgos-papanikos

Hi @giorgos-papanikos, yes that essentially summarizes my idea.

The result will always be a flat list, but the key property will be available if the user has utilized the dictionary initializer.

And if I understand correctly, setting the keyed option to true or false would be implicit based on whether the user has utilized the list or dictionary initializer?

Correct!

What would you suggest as a workaround until the fix is available?

It's perfectly safe to rely on the order of results. The docs explicitly mention that the order is guaranteed to match the ordering of the input filters. I don't see any better workaround right now.

flobernd avatar Jun 19 '24 11:06 flobernd