Allow value_count to support text fields
Is your feature request related to a problem? Please describe. We want to obtain the number of field occurrences for a query search.
For example:
If I send the following query to filter the results and also get aggregations for them (how many times the field is present in the results)
{
"query": "xyz",
"max_hits": 3,
"aggs": {
"hostname": {
"value_count": { "field": "hostname" }
},
"memory": {
"value_count": { "field": "memory" }
}
}
}
The current result is:
{
"data": {
"num_hits": 132268,
"hits": [
{
"hostname": "pc1",
"memory": 4294967296,
},
{
"hostname": "pc2",
"memory": 4294967296,
},
{
"hostname": "pc3",
"memory": 4294967296,
},
],
"aggregations": {
"hostname": {
"value": 0
},
"memory": {
"value": 4234 (whatever value)
}
}
}
}
"memory" works because it is a number. "hostname" doesn't work because it is a text field.
Describe the solution you'd like The result should be
"aggregations": {
"hostname": {
"value": 53454
},
"memory": {
"value": 4234
}
}
Where hostname can be counted on.
Describe alternatives you've considered Create an extra "fields" field with the list of fields in the object, and then run a "terms" aggregation query. This probably works, but it will increase the network traffic through our Kafka pipeline and the index sizes.
Additional context We need the count to be for all values even if repeated. The aggregation counts must correspond to the query provided.
The fix in Tantivy: https://github.com/quickwit-oss/tantivy/pull/2547