intake-stac icon indicating copy to clipboard operation
intake-stac copied to clipboard

How to integrate intake-stac and geopandas

Open scottyhq opened this issue 4 years ago • 3 comments

Converting STAC catalogs (JSON) to intake catalogs is great for facilitating browsing images with intake.gui and loading remote data directly into xarray objects. When loading data returned from a dynamic API like sat-search (https://github.com/radiantearth/stac-spec/issues/691), we could also take advantage of Geopandas for querying the catalog and visualizing metadata, but currently (0.2.2) the integration is a bit awkward:

properties =  ["eo:row=027",
               "eo:column=047",
               "landsat:tier=T1"] 
results = satsearch.Search.search(collection='landsat-8-l1', 
                        sort=['<datetime'], #earliest scene first
                        property=properties)
items = results.items()

It would be great to easily load results.items() into a geodataframe directly

results.items().geojson() # returns a python dictionary
#gf = gpd.GeoDataFrame(results.items().geojson()) #ValueError: arrays must all be same length

This works and provides a very convenient tabular HTML display

items.save('subset.geojson')
gf = gpd.read_file('subset.geojson')
display(gf)
Screenshot 2019-12-15 17 04 20

We could consider adding the same geopandas HTML view directly to the catalog and LocalCatalogEntry objects

# prints <Intake catalog: <class 'satstac.itemcollection.ItemCollection'>>
cat = intake.open_stac_item_collection(results.items())
display(cat)

# also HTML table?
display(cat[sceneid].metadata)
Screenshot 2019-12-15 17 05 47

Enabling geopandas-like methods on an Intake catalog: <class 'satstac.itemcollection.ItemCollection would be very useful! Maybe it's best to just add a cat.to_geopandas() function to enable things like the following?:

gf = cat.to_geopandas()
gf.iloc[0]
gf[gf["eo:cloud_cover"] < 50]
#gf.query("eo:cloud_cover < 50") # Geopandas doesn't like ":" in column names
gf.rename(columns={"eo:cloud_cover": "cloud_cover"}).query("cloud_cover < 50")
gf.plot() #plots footprint polygons of STAC items

scottyhq avatar Dec 16 '19 01:12 scottyhq

I think this is a great idea. Intake-esm does something similar, providing a pandas DataFrame as an attribute on the catalog. This allows intake-esm to use Pandas query logic for search/subsetting of the catalog. The awkward part is figuring out how to move between the two forms of the catalog (intake and DataFrame).

Perhaps a good starting point would be to_/from_geopandas methods for intake-stac.

jhamman avatar Dec 16 '19 18:12 jhamman

What is the expected behavior of from_geopandas() method? As I was working on implementing this method, it occurred to me that when the following is performed:

items.save('subset.geojson')
gf = gpd.read_file('subset.geojson')

the created GeoDataframe doesn't include all the information present in the subset.geojson file, specifically the extra information in the assets sections of the GeoJSON:

"assets": {
            "index": {
                "type": "text/html",
                "title": "HTML index page",
                "href": "https://landsat-pds.s3.amazonaws.com/c1/L8/162/068/LC08_L1TP_162068_20190530_20190605_01_T1/LC08_L1TP_162068_20190530_20190605_01_T1_MTL.txt"
            },
            "thumbnail": {
                "title": "Thumbnail image",
                "type": "image/jpeg",
                "href": "https://landsat-pds.s3.amazonaws.com/c1/L8/162/068/LC08_L1TP_162068_20190530_20190605_01_T1/LC08_L1TP_162068_20190530_20190605_01_T1_thumb_large.jpg"
            },

I presume that this is extra information (useful in STAC context), but unrecognizable to GeoPandas. So, how do we construct the <Intake catalog: <class 'satstac.itemcollection.ItemCollection'>> from a GeoDataFrame that doesn't have all necessary information? Am I missing something?

andersy005 avatar Jan 17 '20 18:01 andersy005

Just wanted to note the current syntax here is giving a futurewarning: https://github.com/intake/intake-stac/blob/c1a65ce1317d76929328fbdd554893cc10a4f6a4/intake_stac/catalog.py#L203-L204

FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))

it's an easy fix, we just need to chage to crs='EPSG:4326'

scottyhq avatar Nov 20 '20 01:11 scottyhq