Why are there different object names in the response of /groupBy/boundary and /ratio/groupByBoundary?
It seem to be inconsistent to me that all groupBy endpoints except the ratio/groupBy/boundary are structured like this:
{
groupByResult:[
result:[ ... ],
groupByObject
]
}
only the ratio/groupBy/boundary uses 'groupByBoundaryResult' instead of 'groupByResult' like this:
{
groupByBoundaryResult:[
ratioResult:[ ... ],
groupByObject
]
}
The "normal" /groupBy/boundary does not use the groupByBoundaryResult but the common groupByResult as all the others.
To me there seems no need for different naming of the groupByResult.
Behind each of these result objects is normally a Java class that has specific attributes, which are reflected by the nested objects here one level deeper in the JSON. So as you can see the response of the /ratio endpoint includes a "ratioResult" JSON array, whereas the other ones include a general "result" JSON array. I think the different naming of the outter object here is a legacy issue from back when we removed the /share endpoint and all its corresponding classes. Back then I combined the remaining /groupBy endpoints to use that one groupByResult, but apparently either forgot to adapt them for the /ratio endpoint or had a reason in the code to distinguish them somehow. It's true though that from a users perspective, this is ambiguous and should not be like that. Will investigate 🔍
From a user perspective I would suggest for the sake of simple response parsing to use the same JSON paths for all groupBy respones such that you always have:
- response
- groupByResult
- groupByObject
- result
- {timestamp, value} OR {timestamp, value, value2, ratio} OR {fromTimestamp, toTimestamp, value}
I made an inspection on what response structure is used where and summarized it as follows:
Current (V1) aggregation response structures
1. DefaultAggregationResponse is used by:
- /elements/<area|count|length|perimeter>
- /elements/<area|count|length|perimeter>/density
- /users/count
- /users/count/density
- response
- result[]
- {timestamp, value} OR {fromTimestamp, toTimestamp, value}
2. RatioResponse is used by:
- /elements/count/ratio
- response
- ratioResult[]
- {timestamp, value, value2, ratio} # while value is the denominator and value2 is the nominator of the ratio (value2/value = ratio)
3. GroupByResponse is used by:
- /elements/<area|count|length|perimeter>/groupBy/<boundary|key|tag|type>
- /elements/<area|count|length|perimeter>/density/groupBy/<boundary|tag|type>
- /users/count/groupBy/<boundary|key|tag|type>
- /users/count/density/groupBy/<boundary|tag|type>
- response
- groupByResult[]
- groupByObject: string
- result[]
- {timestamp, value} OR {fromTimestamp, toTimestamp, value}
4. Nested GroupByResponse:
- /elements/count/groupBy/boundary/groupBy/tag
- /elements/count/density/groupBy/boundary/groupBy/tag
Note: the difference to normal groupBy is just the groupByObject to be an array of strings instead of a single string
- response
- groupByResult[]
- groupByObject: string[]
- result[]
- {timestamp, value}
5. RatioGroupByBoundaryResponse is used by:
- /elements/count/ratio/groupBy/boundary
- response
- groupByBoundaryResult[]
- groupByObject: string
- ratioResult[]
- {timestamp, value, value2, ratio} # while value is the denominator and value2 is the nominator of the ratio (value2/value = ratio)
I think for a future v2 version we should consider to simplify this into only 2 structures as proposed here:
Proposed reduction to only two response structures
Only keep two main structures which contain the 3 different result types (snapshot-value, interval-value, snapshot-ratio)
1. Plain results without grouping
- response
- result[]
- {timestamp, value} OR {fromTimestamp, toTimestamp, value} OR {timestamp, value, value2, ratio}
2. Grouped results with one, two and potential N groupings
corresponds to the current structure for nested groupBy, which could also be used for the "normal" groupBy) for more than 2 groupings one would just have to add further strings to the groupByObject-array for the current normat single grouping, there would be just one entry in the array
- response
- groupByResult[]
- groupByObject: string[]
- result[]
- {timestamp, value} OR {fromTimestamp, toTimestamp, value} OR {timestamp, value, value2, ratio}
I just took a look in the code and tweaked it a bit. The RatioResult can indeed be directly included in the standard Result object for plain /ratio requests. For /ratio/groupBy/boundary requests though, it's not so easily changeable because we use the timestamp in the response for the GeoJSON response format.
See e.g. such a request The timestamp is used here as identification together with the spatial objects. I don't say that it's not possible though. Can sit down again next week on some day and continue with it. And this could also be something for a 1.1 release, don't think that we'd have to wait for v2 to include this update in the running instance.
And this could also be something for a 1.1 release, don't think that we'd have to wait for v2 to include this update in the running instance.
:thinking: IMHO, this should be only changed in a major release, even though this is indeed a very small change and wouldn't change the logic behind the resource: It does change the data interface however and for an end-user it would be very annoying if a working query "suddenly" produces a result which cannot be processed with the existing scripts any more. I experienced a similar issue recently in another project, where a single escape character got changed in the data from an external data provider causing hard to debug follow-up errors down the line (https://github.com/hotosm/osm-analytics-cruncher/issues/20#issuecomment-662340775).
For /ratio/groupBy/boundary requests though, it's not so easily changeable because we use the timestamp in the response for the GeoJSON response format. […] The timestamp is used here as identification together with the spatial objects.
I don't see why this would be much different from the non-ratio request with groupBy/boundary (example). Isn't that one also using the timestamp to distinguish the different result features from each other? Yet, it can still use the same groupByResult field in the format=json output. Or am I overlooking something?