gaia icon indicating copy to clipboard operation
gaia copied to clipboard

Develop geospatial query abstraction

Open aashish24 opened this issue 9 years ago • 9 comments

For now we can use Postgres.

aashish24 avatar Feb 10 '16 20:02 aashish24

Can you elaborate on this? I'm not sure I understand it. Geospatial queries can be done in postgres for data stored in postgres. But what about data in files (shapefile, geojson, etc)?

My idea currently is that vector data stored in files could be queried via geopandas, data in postgres could be queried directly in postgres, but if one wants to do a query involving both file-based data and postgres data together then either the postgres data will need to be retrieved and loaded into a geopandas dataframe or the file data will need to be loaded to postgres.

mbertrand avatar Feb 11 '16 17:02 mbertrand

This was the impression I was under as well. Keep in mind also that geopandas already has a function for reading from PostGIS ( GeoDataFrame.from_postgis)

kotfic avatar Feb 11 '16 20:02 kotfic

PostGIS support branch is here: https://github.com/OpenDataAnalytics/gaia/tree/postgis.

My initial approach is to have alternate compute methods in the Process class if all inputs are of type PostGisIO. If true, do all the computation in PostGIS; if false, load into GeoPandas first and do the calculation there.

https://github.com/OpenDataAnalytics/gaia/blob/b713d6ffb5817f92b5186a00ffdd38b3c5a62a34/gaia/processes_vector.py#L37-L50

mbertrand avatar Feb 26 '16 14:02 mbertrand

@mbertrand here is what we need (all of these are implemented by PostGIS)

  1. findNearBy(location): location ==> (lat, lon)
  2. distance(pt1, pt2) ==> p1 and p2 are points
  3. contains(pt, distance)
  4. intersects(geometry) ==> return all data that intersects with geom
  5. area(geometry)
  6. centroid (geometry)
  7. cross(geometry, geometry)
  8. overlaps(geometry, geometry)
  9. length(geometry)
  10. equals(geometry, geometry)
  11. disjoints(equal equal)
  12. area(geometry)
  13. touches(geometry, geometry)

aashish24 avatar Feb 29 '16 22:02 aashish24

sorry, distance would be distance(geometry, geometry) as well and so is contains

aashish24 avatar Feb 29 '16 22:02 aashish24

@aashish24 please see notes/updated inputs below, let me know if they are what you had in mind:

  1. findNearby(point, dataset, max_distance): point = X,Y coordinate, dataset=postgis table or other vector data source having 1+ features, max_distance: maximum distance from point to search
  2. distance(dataset, dataset) - find minimum distance from each feature in first dataset to a feature in the second dataset).
  3. contains == within(dataset, dataset): find features in 1st dataset contained within features of 2nd dataset
  4. intersects(dataset, dataset): find features of 1st dataset that intersect with features of 2nd dataset
  5. area(dataset): calculate area of each feature in dataset, in units of dataset projection
  6. centroid(dataset): find the centroid of each feature, or the centroid of all features combined.
  7. crosses(dataset, dataset): return features of first dataset that cross features of 2nd dataset. Postgis only, not available in geopandas (but very similar to intersects).
  8. overlaps(dataset, dataset) - find features of 1st dataset that overlap features of 2nd dataset. PostGIS only, not available with geopandas, but very similar to intersects.
  9. length(dataset): calculate the lengths of features in dataset, assuming line/multiline features
  10. equals(dataset, dataset): ?? return true if all features in one dataset are the same as the second?
  11. disjoints == difference(dataset, dataset)? ie return features in 1st dataset that do not intersect features of 2nd dataset
  12. - repeat of 5
  13. touches(dataset, dataset): returns features of first dataset that touch features of second dataset. Not available with geopandas, postgis only.

mbertrand avatar Mar 01 '16 14:03 mbertrand

findNearby(point, dataset, max_distance): point = X,Y coordinate, dataset=postgis table or other vector data source having 1+ features, max_distance: maximum distance from point to search

+1

distance(dataset, dataset) - find minimum distance from each feature in first dataset to a feature in the second dataset).

+1

contains == within(dataset, dataset): find features in 1st dataset contained within features of 2nd dataset

+1

intersects(dataset, dataset): find features of 1st dataset that intersect with features of 2nd dataset area(dataset): calculate area of each feature in dataset, in units of dataset projection centroid(dataset): find the centroid of each feature, or the centroid of all features combined. crosses(dataset, dataset): return features of first dataset that cross features of 2nd dataset. Postgis only, not available in geopandas (but very similar to intersects). overlaps(dataset, dataset) - find features of 1st dataset that overlap features of 2nd dataset. PostGIS only, not available with geopandas, but very similar to intersects. length(dataset): calculate the lengths of features in dataset, assuming line/multiline features equals(dataset, dataset): ?? return true if all features in one dataset are the same as the second? disjoints == difference(dataset, dataset)? ie return features in 1st dataset that do not intersect features of 2nd dataset

  • repeat of 5 touches(dataset, dataset): returns features of first dataset that touch features of second dataset. Not available with geopandas, postgis only.

+1

Looks great to me. Thanks @mbertrand. I have few questions on the implementation but we can talk about it over the hangout.

aashish24 avatar Mar 01 '16 14:03 aashish24

@aashish24 @kotfic Here is an ipython notebook demonstrating most of these processes: https://gist.github.com/mbertrand/e83b7d62ce74fe9e6c53

mbertrand avatar Mar 04 '16 22:03 mbertrand

https://github.com/OpenDataAnalytics/gaia/pull/58

mbertrand avatar Mar 07 '16 21:03 mbertrand