intake-stac
intake-stac copied to clipboard
How to integrate intake-stac and geopandas
Converting STAC catalogs (JSON) to intake catalogs is great for facilitating browsing images with intake.gui and loading remote data directly into xarray objects. When loading data returned from a dynamic API like sat-search (https://github.com/radiantearth/stac-spec/issues/691), we could also take advantage of Geopandas for querying the catalog and visualizing metadata, but currently (0.2.2) the integration is a bit awkward:
properties = ["eo:row=027",
"eo:column=047",
"landsat:tier=T1"]
results = satsearch.Search.search(collection='landsat-8-l1',
sort=['<datetime'], #earliest scene first
property=properties)
items = results.items()
It would be great to easily load results.items() into a geodataframe directly
results.items().geojson() # returns a python dictionary
#gf = gpd.GeoDataFrame(results.items().geojson()) #ValueError: arrays must all be same length
This works and provides a very convenient tabular HTML display
items.save('subset.geojson')
gf = gpd.read_file('subset.geojson')
display(gf)
![Screenshot 2019-12-15 17 04 20](https://user-images.githubusercontent.com/3924836/70872429-0389bf00-1f5d-11ea-9703-fd88b36ec9f1.png)
We could consider adding the same geopandas HTML view directly to the catalog and LocalCatalogEntry objects
# prints <Intake catalog: <class 'satstac.itemcollection.ItemCollection'>>
cat = intake.open_stac_item_collection(results.items())
display(cat)
# also HTML table?
display(cat[sceneid].metadata)
![Screenshot 2019-12-15 17 05 47](https://user-images.githubusercontent.com/3924836/70872449-26b46e80-1f5d-11ea-8cdf-14535701d52c.png)
Enabling geopandas-like methods on an Intake catalog: <class 'satstac.itemcollection.ItemCollection
would be very useful! Maybe it's best to just add a cat.to_geopandas() function to enable things like the following?:
gf = cat.to_geopandas()
gf.iloc[0]
gf[gf["eo:cloud_cover"] < 50]
#gf.query("eo:cloud_cover < 50") # Geopandas doesn't like ":" in column names
gf.rename(columns={"eo:cloud_cover": "cloud_cover"}).query("cloud_cover < 50")
gf.plot() #plots footprint polygons of STAC items
I think this is a great idea. Intake-esm does something similar, providing a pandas DataFrame as an attribute on the catalog. This allows intake-esm to use Pandas query logic for search/subsetting of the catalog. The awkward part is figuring out how to move between the two forms of the catalog (intake and DataFrame).
Perhaps a good starting point would be to_/from_geopandas
methods for intake-stac.
What is the expected behavior of from_geopandas()
method? As I was working on implementing this method, it occurred to me that when the following is performed:
items.save('subset.geojson')
gf = gpd.read_file('subset.geojson')
the created GeoDataframe
doesn't include all the information present in the subset.geojson
file, specifically the extra information in the assets
sections of the GeoJSON:
"assets": {
"index": {
"type": "text/html",
"title": "HTML index page",
"href": "https://landsat-pds.s3.amazonaws.com/c1/L8/162/068/LC08_L1TP_162068_20190530_20190605_01_T1/LC08_L1TP_162068_20190530_20190605_01_T1_MTL.txt"
},
"thumbnail": {
"title": "Thumbnail image",
"type": "image/jpeg",
"href": "https://landsat-pds.s3.amazonaws.com/c1/L8/162/068/LC08_L1TP_162068_20190530_20190605_01_T1/LC08_L1TP_162068_20190530_20190605_01_T1_thumb_large.jpg"
},
I presume that this is extra information (useful in STAC context), but unrecognizable to GeoPandas. So, how do we construct the <Intake catalog: <class 'satstac.itemcollection.ItemCollection'>>
from a GeoDataFrame that doesn't have all necessary information? Am I missing something?
Just wanted to note the current syntax here is giving a futurewarning: https://github.com/intake/intake-stac/blob/c1a65ce1317d76929328fbdd554893cc10a4f6a4/intake_stac/catalog.py#L203-L204
FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
it's an easy fix, we just need to chage to crs='EPSG:4326'