tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Aggregation Feature Parity with Elasticsearch

Open PSeitz opened this issue 3 years ago • 1 comments

The list provides an overview on which aggregations are missing in tantivy to reach feature parity with Elasticsearch.

Bucket Aggregations

Supported

Terms 
    Only text,u64, i64, f64 supported currently. Unsupported fields are date, boolean, ip, bytes
    Missing support for include, exclude filter
Range (no support for ip)
Histogram (no support for ip)
Date histogram (no support for calendar_interval)

Not Implemented

Adjacency matrix
Auto-interval date histogram
Categorize text
Children
Composite
Date range
Diversified sampler
Filter
Filters
Frequent items
Geo-distance
Geohash grid
Geohex grid
Geotile grid
Global
IP prefix
IP range
Missing
Multi Terms
Nested
Parent
Random sampler
Rare terms
Reverse nested
Sampler
Significant terms
Significant text
Variable width histogram

Note: For some aggregations, there are missing prerequisites, e.g. storing nulls, geo type etc.

Metric Aggregations

Supported

Avg
Stats
Max
Min
Value count
Percentiles
Cardinality
Top hits
Extended stats

Not Implemented

Boxplot
Geo-bounds
Geo-centroid
Geo-Line
Matrix stats
Median absolute deviation
Percentile ranks
Rate
Scripted metric
String stats
Sum (covered by Stats)
T-test
Top metrics
Weighted avg

Pipeline Aggregations

Unsupported. Support for pipeline aggregations would require a relative large overhaul on how aggregations are collected.

PSeitz avatar Nov 22 '22 12:11 PSeitz

For the airmail project, diff and percentiles are required:

  • diff returns the largest difference between values for each interval
  • percentiles returns the nth percentile for each interval (75th, 85th, 95th, and 99th percentile)

guilload avatar Dec 16 '22 15:12 guilload