Support ip_range field formatting (cidr, range)
Description
Borrowing from the concepts here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-api-fields I would like to be able to format an IP range field in either a range, or CIDR notation using "fields" in a query, to make the returned data consistent, rather than dependent on how it was indexed.
Since "fields" always returns an array of values, and documents can be indexed with a range that doesn't line up with a single CIDR block { "gt": "192.168.1.22", "lte": "192.168.1.37" } -- those edge cases in "cidr" format should be returned as a deaggreated array of CIDR netblocks like:
[ "192.168.1.23/32", "192.168.1.24/29", "192.168.1.32/30", "192.168.1.36/31" ]. In "range" format it should return an inclusive list as if the document was indexed with "gte" and "lte" -- e.g. [ "192.168.1.23-192.168.1.37" ]
Example mapping:
PUT /networks
{
"mappings": {
"properties": {
"network": {
"type": "ip_range"
}
}
}
}
Example documents:
PUT /networks/_doc/1
{
"network": "192.168.1.0/24"
}
PUT /networks/_doc/2
{
"network": {
"gte": "192.168.0.0",
"lte": "192.168.0.255"
}
}
Example query, showing results today:
GET /networks/_search
{
"query": {
"match_all": {}
}
}
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "networks",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"network" : "192.168.1.0/24"
}
},
{
"_index" : "networks",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"network" : {
"gte" : "192.168.0.0",
"lte" : "192.168.0.255"
}
}
}
]
}
}
Example query, showing desired cidr format functionality:
GET /networks/_search
{
"query": {
"match_all": {}
},
"fields": [
{
"field": "network",
"format": "cidr"
}
]
}
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "networks",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"network" : "192.168.1.0/24"
},
"fields" : {
"network": [
"192.168.1.0/24"
]
}
},
{
"_index" : "networks",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"network" : {
"gte" : "192.168.0.0",
"lte" : "192.168.0.255"
}
},
"fields" : {
"network": [
"192.168.0.0/24"
]
}
}
]
}
}
Example query, showing desired range format functionality:
GET /networks/_search
{
"query": {
"match_all": {}
},
"fields": [
{
"field": "network",
"format": "range"
}
]
}
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "networks",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"network" : "192.168.1.0/24"
},
"fields" : {
"network": [
"192.168.1.0-192.168.1.255"
]
}
},
{
"_index" : "networks",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"network" : {
"gte" : "192.168.0.0",
"lte" : "192.168.0.255"
}
},
"fields" : {
"network": [
"192.168.0.0-192.168.0.255"
]
}
}
]
}
}
Pinging @elastic/es-search (Team:Search)
I like this idea, but it seems to me like it would be even better if it was used to aggregate information. One of the downsides is that we may be losing fidelity a bit. Even though a cidr could encompass 64 IPs, the actual count that was indexed or we care about is only 57.
So, in a multi-bucket ranges count, we would overcount various ranges. We have this metadata field called _doc_count to alleviate this that could be used in conjunction with this new field type
This may be helpful in the o11y space as a whole.
@felixbarny what do you think?
@benwtrent the point I was making here is the consistency of returned data is desirable. It's the same data, just represented seemingly randomly to the end user/consumer, because it's oddly dependent on how the data was indexed, even if the ranges are exactly the same.
If I index 192.168.0.0/24 it comes back as a CIDR block 192.168.0.0/24. If I index 192.168.0.0-192.168.0.255 it comes back as a range.
These are exactly the same value just expressed with a different notation.
Thank you for the clarification @warewolf
Pinging @elastic/es-search-foundations (Team:Search Foundations)
BTW I am no longer employed with the company where this feature/use case was desirable. While I still believe this would be functionally useful, I no longer have a direct interest. I'm gonna leave this feature request open, but personally will unsubscribe.