geoparquet
geoparquet copied to clipboard
Correct the 'bbox' description for 3D geometries
In https://github.com/opengeospatial/geoparquet/pull/45 we explicitly allowed 3D geometries, but in that PR I forgot to update the bbox
description to reflect this.
@Jesus89 do you know how the json schema needs to be updated for this? (now it harcoded 4 items, but it needs to be 4 or 6)
I think we can use oneOf and allow both options.
If at some point, we include an explicit param for 3D, the If-Then-Else condition could be used too.
A proposal:
{
"bbox": {
"type": "array",
"description": "Bounding Box of the geometries in the file, formatted according to RFC 7946, section 5.",
"oneOf": [
{
"items": [
{
"type": "number",
"description": "The westmost constant longitude line that bounds the rectangle (xmin)."
},
{
"type": "number",
"description": "The minimum constant latitude line that bounds the rectangle (ymin)."
},
{
"type": "number",
"description": "The eastmost constant longitude line that bounds the rectangle (xmax)."
},
{
"type": "number",
"description": "The maximum constant latitude line that bounds the rectangle (ymax)."
}
],
"minItems": 4,
"additionalItems": false
},
{
"items": [
{
"type": "number",
"description": "The westmost constant longitude line that bounds the parallelepiped (xmin)."
},
{
"type": "number",
"description": "The minimum constant latitude line that bounds the parallelepiped (ymin)."
},
{
"type": "number",
"description": "The minimum constant altitude line that bounds the parallelepiped (zmin)."
},
{
"type": "number",
"description": "The eastmost constant longitude line that bounds the parallelepiped (xmax)."
},
{
"type": "number",
"description": "The maximum constant latitude line that bounds the parallelepiped (ymax)."
},
{
"type": "number",
"description": "The maximum constant altitude line that bounds the parallelepiped (zmax)."
}
],
"minItems": 6,
"additionalItems": false
}
]
}
}
Note that we should review the items' descriptions because of the CRS.
A proposal:
~~It looks like you're trying to use items
as a tuple. Does that work in draft 7?~~ Ah I see, your usage is correct. It changed from items
to prefixItems
in draft 2019-09
https://json-schema.org/understanding-json-schema/reference/array.html#tuple-validation
Given that we split up the 2D and 3D case in the schema, it probably wouldn't hurt to set maxItems: 4
and maxItems: 6
respectively? Specifically I wonder, if you had [0, 0, 0, 0, 'a', 'a']
, could that match the first branch, because it has four numbers and a minimum of four elements? (This would be a good case for a future test suite 🙂 )
Note that we should review the items' descriptions because of the CRS.
+1. Also, I feel like we need to discuss or document whether this bbox is in OGC:WGS84 or in the native CRS of the data. Right now the spec says:
OPTIONAL Bounding Box of the geometries in the file, formatted according to RFC 7946, section 5.
But looking at RFC 7946, section 5, that says:
The axes order of a bbox follows the axes order of geometries.
If I interpret that correctly, if data is saved as EPSG:4326
, that means the order of the bbox would be
[minLatitude, minLongitude, maxLatitude, maxLongitude]
which is contrary to the JSON schema at least. And I think most often bboxes are [minLongitude, minLatitude, maxLongitude, maxLatitude]
, so we should at least document what behavior we expect.
Per some of the CRS related discussions, and a desire to support non CRS-aware readers, it seems important that the axis order of the bounds object remains constant regardless of CRS: [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax]. So it seems the right thing to do is remove the reference to the RFC, and simply state the expected order.
Our specification is very explicit about axis order, which is always the same regardless of the CRS you use:
https://github.com/opengeospatial/geoparquet/blob/a3c1212b59bce9ea9220897040bf38e3d6381fff/format-specs/geoparquet.md?plain=1#L124-L128
So I don't think there is any ambiguity for the bbox following "axis order of the geometries", as this is always the same.
But it would indeed be useful to clarify that the bbox is always in the same coordinate system as the data.
And the reference to the GeoJSON spec is I think still useful to refer to for the handling of antimeridian, poles, etc
The discussion is a bit split, but see my last comment at https://github.com/opengeospatial/geoparquet/issues/92#issuecomment-1130174010. So I am not fully sure myself this PR should be merged as is in its current state (requiring 3D bbox for 3D data).
Ah, I was remembering that comment but then couldn't figure out where I saw it.
I think if we're not feeling sure about it then we should just hold off for now. See how 3d data in geoparquet plays out and wait for someone to make the case. I'll push out the milestone on this one.
This PR creates an inconsistency, since the referenced GeoJSON specification does explicitly ask for the bbox to include all used dimensions. To prevent confusion it might be good to explicitly call out that deviation.
Consensus on call 9/24 seems to be we should just align with GeoJSON and do the same thing they do. @jorisvandenbossche to update the schema.
I think the schema can be greatly simplified as the descriptions are just annotations anyway and are not used for validation. I assume 2D and 3D is allowed. So something like:
{
"description": "Bounding Box of the geometries in the file, formatted according to RFC 7946, section 5.",
"type":"array",
"items":{
"type":"number"
},
"oneOf":[
{
"description": "2D bbox consisting of (xmin, ymin, xmax, ymax)",
"minItems":4,
"maxItems":4
},
{
"description": "3D bbox consisting of (xmin, ymin, zmin, xmax, ymax, zmax)",
"minItems":6,
"maxItems":6
}
]
}
GeoJSON does not require that geometries have only 2 or 3 dimensions. It recommends avoiding more than 3, but doesn't forbid it (see the "SHOULD NOT" in Section 3.1.1.).
In addition, the bbox
member is required to have the range of values for all dimensions (not just the first 2 or 3). Is it intentional that GeoParquet adds an additional restriction on top of this (allowing only 4 or 6 item bboxes)?
Update: suggested change in #144
In addition, the bbox member is required to have the range of values for all dimensions (not just the first 2 or 3). Is it intentional that GeoParquet adds an additional restriction on top of this (allowing only 4 or 6 item bboxes)?
We only allow 2D or 3D geometries, at the moment (although that is a bit hidden in the description of the "encoding" keyword):
https://github.com/opengeospatial/geoparquet/blob/085bb6f9fb50b31e38b4d711d867c5366598108f/format-specs/geoparquet.md?plain=1#L113
So that means in practice you can only have 4 or 6 values for the bbox?
We only allow 2D or 3D geometries, at the moment (although that is a bit hidden in the description of the "encoding" keyword)
I completely missed that.