datacube-core
datacube-core copied to clipboard
Valid datasets not returned for some spatial queries (Global Datasets/Polygon Intersection)
Valid datasets that DO overlap with query polygon might be rejected by the post-query filtering step. Problem is in select_datasets_inside_polygon
https://github.com/opendatacube/datacube-core/blob/0bf503fe5b92acbd8125b042de8b7123866a73a6/datacube/api/core.py#L676-L682
Specifically this part intersects(polygon, dataset.extent.to_crs(query_crs)) is not a valid way to compute "Do two polygons defined in two different projections overlap?"
poly.to_crs(query_crs) simply projects points along the perimeter of the polygon to a new reference frame, ignoring the fact that those points might not be representable in the new projection.
An example where this breaks down is "global dataset" (covering entire globe) defined in EPSG:4326 and having extent lon:-180,180, lat:-90:90. Polygon like that won't project cleanly into pretty much any other CRS, and so this dataset will be rejected, yet it overlaps with any query.
One way could be to bring both dataset.extent and polygon crs to epsg:4326 first and then check for intersection. Any crs coordinate will have a valid epsg:4326 footprint. But this method should be an option for user via a function flag as large area polygons in Epsg:4326 will suffer from distortions.
@tushar10sh it's not simple, conversion to lon/lat suffers from lon=180 wrap around issues. Also there are issues with precision, two polygons that overlap slightly in their native CRS might not, once translated to another projection if you haven't added enough points in just the right places. We are not transforming infinite number of perimeter points you know, but rather a bunch of line segments. So those need to be small enough, which means a lot of them, which means slow.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.