torchgeo
torchgeo copied to clipboard
STAC API dataset
SpatioTemporal Asset Catalogs (STACs) are a way to organize geospatial datasets. STAC APIs let users query huge STAC Catalogs by date, time, and other metadata.
For example, the Microsoft Planetary Computer runs a STAC API that lets users search over catalogs containing all of Sentinel 2 imagery, all Landsat 8, etc. The following code uses the pystac_client library to query the Planetary Computer STAC API and returns metadata, and links to GeoTIFFs, for relevant Sentinel 2 scenes:
from pystac_client import Client
area_of_interest = {
"type": "Polygon",
"coordinates": [
[
[-148.56536865234375, 60.80072385643073],
[-147.44338989257812, 60.80072385643073],
[-147.44338989257812, 61.18363894915102],
[-148.56536865234375, 61.18363894915102],
[-148.56536865234375, 60.80072385643073],
]
],
}
time_of_interest = "2019-06-01/2019-08-01"
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
search = catalog.search(
collections=["sentinel-2-l2a"],
intersects=area_of_interest,
datetime=time_of_interest,
query={"eo:cloud_cover": {"lt": 10}},
)
items = list(search.get_items())
print(f"Returned {len(items)} Items")
We'd like to build a STACAPIDataset object that essentially wraps catalog.search(...), creates a RasterDataset from the returned items, and otherwise behaves as a normal PyTorch dataset (signing assets as needed, etc.). A signature like STACAPIDataset(root="data/", api_endpoint, max_cache_size=None, **query_parameters_to_pystac_client) would be a good starting point here.
As a detailed note, it may be a good idea to cache accessed data in a local directory.
I would be really interested in taking on this task!
All yours :) (I had you in mind writing this actually, it is a bit more interesting than the other dataset stuff!) -- feel free to message me if you want to discuss details
@adamjstewart -- this would involve taking on some dependencies (pystac_client, planetary-computer, maybe stackstac)
We can make those deps optional if we need to.
Nice potential feature, is there still intention to work on it?